Getting Started

Installation

Prerequisites\

A recent version of git (e.g. ^2.37 )
Python 3.10 or higher
Poerty 1.8 or higher
Note: to install Poetry consult the Poetry documentation
Note: Install poetry system-wide (not in a virtualenv).
Git LFS

Using Git LFS

This tool uses Git LFS for handling large files. Before using it you need to install and set up Git LFS on your local machine. See https://git-lfs.com/ for installation instructions.

After Git LFS is set up, follow these steps to clone the repository:

git clone https://github.com/tabiya-tech/tabiya-livelihoods-classifier.git

If you already cloned the repository without Git LFS, run:

git lfs pull

Install the dependencies

Set up virtualenv

In the root directory of the backend project (so, the same directory as this README file), run the following commands:

# create a virtual environment
python3 -m venv venv

# activate the virtual environment
source venv/bin/activate

# Use the version of the dependencies specified in the lock file
poetry lock --no-update
# Install missing and remove unreferenced packages
poetry install --sync

Note: Install the dependencies for the training using:

# Use the version of the dependencies specified in the lock file
poetry lock --no-update
# Install missing and remove unreferenced packages
poetry install --sync --with train

Note: Before running any tasks, activate the virtual environment so that the installed dependencies are available:
# activate the virtual environment
source venv/bin/activate
To deactivate the virtual environment, run:
# deactivate the virtual environment
deactivate

Activate Python and download the NLTK punctuation package to use the sentence tokenizer. You only need to download punkt it once.

python <<EOF
import nltk
nltk.download('punkt')
EOF

Environment Variable & Configuration

The tool uses the following environment variable:

HF_TOKEN: To use the project, you need access to the HuggingFace 🤗 entity extraction model. Contact the administrators via [tabiya@benisis.de]. From there, you must create a read access token to use the model. Find or create your read access token here. The backend supports the use of a .env file to set the environment variable. Create a .env file in the root directory of the backend project and set the environment variables as follows:

# .env file
HF_TOKEN=<YOUR_HF_TOKEN>

ATTENTION: The .env file should be kept secure and not shared with others as it contains sensitive information.

QuickStart Guide

Inference Pipeline

The inference pipeline extracts occupations and skills from a job description and matches them to the most similar entities in the ESCO taxonomy.

Usage

First, activate the virtual environment as explained here.

Then, start python interpreter in the root directory and run the following commands:

Load the EntityLinker class and create an instance of the class, then perform inference on any text with the following code:

from inference.linker import EntityLinker
pipeline = EntityLinker(k=5)
text = 'We are looking for a Head Chef who can plan menus.'
extracted = pipeline(text)
print(extracted)

After running the commands above, you should see the following output:

[
  {'type': 'Occupation', 'tokens': 'Head Chef', 'retrieved': ['head chef', 'industrial head chef', 'head pastry chef', 'chef', 'kitchen chef']},
  {'type': 'Skill', 'tokens': 'plan menus', 'retrieved': ['plan menus', 'plan patient menus', 'present menus', 'plan schedule', 'plan engineering activities']}
]

French version

You can use the French version of the Entity Linker using the following code:

from inference.linker import FrenchEntityLinker
pipeline = FrenchEntityLinker(entity_model = 'tabiya/camembert-large-job-ner', similarity_model = 'intfloat/multilingual-e5-base')

text = 'Nous recherchons un chef de cuisine capable de planifier les menus.'
extracted = pipeline(text)
print(extracted)

You should see the following output:

[
  {'type': 'Occupation', 'tokens': 'chef de cuisine', 'retrieved': ['chef de cuisine', 'chef de marque', 'chef mécanicien', 'chef cuisinier/cheffe cuisinière', 'chef de train']}, 
  {'type': 'Skill', 'tokens': 'planifier les menus', 'retrieved': ['planifier les menus', 'présenter des menus', 'établir les menus des patients', 'préparer des plannings', 'préparer des plats préparés']}
]

Running the evaluation tests

Load the Evaluator class and print the results:

from inference.evaluator import Evaluator

results = Evaluator(entity_type='Skill', entity_model='tabiya/roberta-base-job-ner', similarity_model='all-MiniLM-L6-v2', crf=False, evaluation_mode=True)
print(results.output)

This class inherits from the EntityLinker, with the main difference being the 'entity_type' flag.

If you want to run evaluations on custom datasets, you will need to make modifications to the _load_dataset function, located on the evaluation.py file. Please refer to the original evaluation datasets as described here. If you have any trouble, please open an issue on GitHub.

Minimum Hardware

4 GB CPU/GPU RAM

The code runs on GPU if available. Ensure your machine has CUDA installed if running on GPU.

PreviousLivelihoods Classifier NextWeb Application

Last updated 2 months ago