In this page we aim to give further details about the classes and functions located to the GItHub repository.
inference/linker.py
class EntityLinker
Creates a pipeline of an entity recognition transformer and a sentence transformer for embedding text.
Initialization Parameters
entity_model : str, default='tabiya/roberta-base-job-ner' Path to a pre-trained AutoModelForTokenClassification model or an AutoModelCrfForNer model. This model is used for entity recognition within the input text.
similarity_model : str, default='all-MiniLM-L6-v2' Path or name of a sentence transformer model used for embedding text. The sentence transformer is used to compute embeddings for the extracted entities and the reference sets. The model 'all-mpnet-base-v2' is available but not in cache, so it should be used with the parameter from_cache=False at least the first time.
crf : bool, default=False A flag to indicate whether to use an AutoModelCrfForNer model instead of a standard AutoModelForTokenClassification. CRF (Conditional Random Field) models are used when the task requires sequential predictions with dependencies between the outputs.
evaluation_mode : bool, default=False If set to True, the linker will return the cosine similarity scores between the embeddings. This mode is useful for evaluating the quality of the linkages.
k : int, default=32 Specifies the number of items to retrieve from the reference sets. This parameter limits the number of top matches to consider when linking entities.
from_cache : bool, default=True If set to True, the precomputed embeddings are loaded from cache to save time. If set to False, the embeddings are computed on-the-fly, which requires GPU access for efficiency and can be time-consuming.
output_format : str, default='occupation' Specifies the format of the output for occupations, either occupation, preffered_label, esco_code, uuid or all to get all the columns. The uuid is also available for the skills.
Calling Parameters
text : str An arbitrary job vacancy-related string.
linking : bool, default=True Specify whether the model performs the entity linking to the taxonomy.
class FrenchEntityLinker
French version of the entity linker. In order to use, we need to rewrite the reference databases to the French version of ESCO.
inference/evaluator.py
class Evaluator(EntityLinker)
Evaluator class that inherits the Entity Linker. It computes the queries, corpus, inverted corpus and relevant docs for the InformationRetrievalEvaluator, performs entity linking and computes the Information Retrieval Metrics.
Initialization Parameters
entity_type: str Occupation, Skill, or Qualification to determine the exact evaluation set to be used.