Tabiya Documentation
HomepageGithub
đŸ‡¬đŸ‡§ English
  • Tabiya Documentation
đŸ‡¬đŸ‡§ English
  • Welcome
  • Overview
    • About Tabiya
    • The Global Youth Employment Challenge
      • The Role of Labor Market Intermediation
      • Digital Platforms and AI in LMIC Labor Market Intermediation
  • Open-Source Tech for Labor Markets
  • Our Tech Stack
    • Inclusive Livelihoods Taxonomy
      • Methodology
      • Why ESCO?
      • Core Taxonomy
      • Open Taxonomy Platform
      • Taxonomy CSV Format
    • Livelihoods Classifier
      • Getting Started
      • Web Application
      • Datasets
      • Training
      • Advanced Topics
      • Contributing Guide
      • FAQs
      • Demo Video
    • Compass
      • Technical Overview
      • UX Evaluation
        • UX Testing Discussion Guide
      • Roadmap
Powered by GitBook
On this page
  • Installation
  • QuickStart Guide
  • Inference Pipeline
  • Usage
  • French version
  • Running the evaluation tests
  • Minimum Hardware
Export as PDF
  1. Our Tech Stack
  2. Livelihoods Classifier

Getting Started

PreviousLivelihoods ClassifierNextWeb Application

Last updated 2 months ago

Installation

Prerequisites\

  • A recent version of (e.g. ^2.37 )

  • Note: to install Poetry consult the

    Note: Install poetry system-wide (not in a virtualenv).

Using Git LFS

This tool uses Git LFS for handling large files. Before using it you need to install and set up Git LFS on your local machine. See https://git-lfs.com/ for installation instructions.

After Git LFS is set up, follow these steps to clone the repository:

git clone https://github.com/tabiya-tech/tabiya-livelihoods-classifier.git

If you already cloned the repository without Git LFS, run:

git lfs pull

Install the dependencies

Set up virtualenv

In the root directory of the backend project (so, the same directory as this README file), run the following commands:

# create a virtual environment
python3 -m venv venv

# activate the virtual environment
source venv/bin/activate
# Use the version of the dependencies specified in the lock file
poetry lock --no-update
# Install missing and remove unreferenced packages
poetry install --sync

Note: Install the dependencies for the training using:

# Use the version of the dependencies specified in the lock file
poetry lock --no-update
# Install missing and remove unreferenced packages
poetry install --sync --with train

Note: Before running any tasks, activate the virtual environment so that the installed dependencies are available:

# activate the virtual environment
source venv/bin/activate

To deactivate the virtual environment, run:

# deactivate the virtual environment
deactivate

Activate Python and download the NLTK punctuation package to use the sentence tokenizer. You only need to download punkt it once.

python <<EOF
import nltk
nltk.download('punkt')
EOF

Environment Variable & Configuration

The tool uses the following environment variable:

# .env file
HF_TOKEN=<YOUR_HF_TOKEN>

ATTENTION: The .env file should be kept secure and not shared with others as it contains sensitive information.

QuickStart Guide

Inference Pipeline

The inference pipeline extracts occupations and skills from a job description and matches them to the most similar entities in the ESCO taxonomy.

Usage

Then, start python interpreter in the root directory and run the following commands:

Load the EntityLinker class and create an instance of the class, then perform inference on any text with the following code:

from inference.linker import EntityLinker
pipeline = EntityLinker(k=5)
text = 'We are looking for a Head Chef who can plan menus.'
extracted = pipeline(text)
print(extracted)

After running the commands above, you should see the following output:

[
  {'type': 'Occupation', 'tokens': 'Head Chef', 'retrieved': ['head chef', 'industrial head chef', 'head pastry chef', 'chef', 'kitchen chef']},
  {'type': 'Skill', 'tokens': 'plan menus', 'retrieved': ['plan menus', 'plan patient menus', 'present menus', 'plan schedule', 'plan engineering activities']}
]

French version

You can use the French version of the Entity Linker using the following code:

from inference.linker import FrenchEntityLinker
pipeline = FrenchEntityLinker(entity_model = 'tabiya/camembert-large-job-ner', similarity_model = 'intfloat/multilingual-e5-base')

text = 'Nous recherchons un chef de cuisine capable de planifier les menus.'
extracted = pipeline(text)
print(extracted)

You should see the following output:

[
  {'type': 'Occupation', 'tokens': 'chef de cuisine', 'retrieved': ['chef de cuisine', 'chef de marque', 'chef mécanicien', 'chef cuisinier/cheffe cuisinière', 'chef de train']}, 
  {'type': 'Skill', 'tokens': 'planifier les menus', 'retrieved': ['planifier les menus', 'présenter des menus', 'établir les menus des patients', 'préparer des plannings', 'préparer des plats préparés']}
]

Running the evaluation tests

Load the Evaluator class and print the results:

from inference.evaluator import Evaluator

results = Evaluator(entity_type='Skill', entity_model='tabiya/roberta-base-job-ner', similarity_model='all-MiniLM-L6-v2', crf=False, evaluation_mode=True)
print(results.output)

This class inherits from the EntityLinker, with the main difference being the 'entity_type' flag.

Minimum Hardware

  • 4 GB CPU/GPU RAM

The code runs on GPU if available. Ensure your machine has CUDA installed if running on GPU.

HF_TOKEN: To use the project, you need access to the HuggingFace đŸ¤— entity extraction model. Contact the administrators via [tabiya@benisis.de]. From there, you must create a read access token to use the model. Find or create your read access token . The backend supports the use of a .env file to set the environment variable. Create a .env file in the root directory of the backend project and set the environment variables as follows:

First, activate the virtual environment as explained .

If you want to run evaluations on custom datasets, you will need to make modifications to the _load_dataset function, located on the evaluation.py file. Please refer to the original evaluation datasets as described . If you have any trouble, please open an issue on .

git
Python 3.10 or higher
Poerty 1.8 or higher
Poetry documentation
Git LFS
here
here
GitHub
here