Advanced Topics

In this page we aim to give further details about the classes and functions located to the GItHub repository.

inference/linker.`py`

class EntityLinker

Creates a pipeline of an entity recognition transformer and a sentence transformer for embedding text.

Initialization Parameters

entity_model : str, default='tabiya/roberta-base-job-ner' Path to a pre-trained AutoModelForTokenClassification model or an AutoModelCrfForNer model. This model is used for entity recognition within the input text.

similarity_model : str, default='all-MiniLM-L6-v2' Path or name of a sentence transformer model used for embedding text. The sentence transformer is used to compute embeddings for the extracted entities and the reference sets. The model 'all-mpnet-base-v2' is available but not in cache, so it should be used with the parameter from_cache=False at least the first time.

crf : bool, default=False A flag to indicate whether to use an AutoModelCrfForNer model instead of a standard AutoModelForTokenClassification. CRF (Conditional Random Field) models are used when the task requires sequential predictions with dependencies between the outputs.

evaluation_mode : bool, default=False If set to True, the linker will return the cosine similarity scores between the embeddings. This mode is useful for evaluating the quality of the linkages.

k : int, default=32 Specifies the number of items to retrieve from the reference sets. This parameter limits the number of top matches to consider when linking entities.

from_cache : bool, default=True If set to True, the precomputed embeddings are loaded from cache to save time. If set to False, the embeddings are computed on-the-fly, which requires GPU access for efficiency and can be time-consuming.

output_format : str, default='occupation' Specifies the format of the output for occupations, either occupation, preffered_label, esco_code, uuid or all to get all the columns. The uuid is also available for the skills.

Calling Parameters

text : str An arbitrary job vacancy-related string.

linking : bool, default=True Specify whether the model performs the entity linking to the taxonomy.

class FrenchEntityLinker

French version of the entity linker. In order to use, we need to rewrite the reference databases to the French version of ESCO.

`inference/evaluator.py`

class Evaluator(EntityLinker)

Evaluator class that inherits the Entity Linker. It computes the queries, corpus, inverted corpus and relevant docs for the InformationRetrievalEvaluator, performs entity linking and computes the Information Retrieval Metrics.

Initialization Parameters

entity_type: str Occupation, Skill, or Qualification to determine the exact evaluation set to be used.

`util/transformersCRF.py`

class CRF(nn.Module)

Implemented from here.

A class that creates a linear Conditional Random Field model.

class AutoModelForCrfPretrainedConfig(PretrainedConfig)

Configuration class that inherits from PretrainedConfig HuggingFace class.

class AutoModelCrfForNer(PreTrainedModel)

A general class that inherits from PreTrainedModel HuggingFace class. The model_type is detected automatically.

model_type: str Possible options include BertCrfForNer, RobertaCrfForNer and DebertaCrfForNer.

class BERT_CRF_Config(PretrainedConfig)

Custom class used for configuring BERT for CRF.

class BertCrfForNer(PreTrainedModel)

BERT-based CRF model that inherits from PreTrainedModel HuggingFace class.

Same as PreTrainedModel HuggingFace.

Forward Parameters

Same as PreTrainedModel HuggingFace except for

special_tokens_mask default: None. We use this option from HuggingFace as a small hack to implement the special_mask needed for CRF.

class ROBERTA_CRF_Config(PretrainedConfig)

Custom class used for configuring RoBERTa for CRF.

class RobertaCrfForNer(PreTrainedModel)

RoBERTa-based CRF model that inherits from PreTrainedModel HuggingFace class.

Same as PreTrainedModel HuggingFace.

Forward Parameters

Same as PreTrainedModel HuggingFace except for

special_tokens_mask default: None. We use this option from HuggingFace as a small hack to implement the special_mask needed for CRF.

class DEBERTA_CRF_Config(PretrainedConfig)

Custom class used for configuring RoBERTa for CRF.

class DebertaCrfForNer(PreTrainedModel)

RoBERTa-based CRF model that inherits from PreTrainedModel HuggingFace class.

Same as PreTrainedModel HuggingFace.

Forward Parameters

Same as PreTrainedModel HuggingFace except for

special_tokens_mask default: None. We use this option from HuggingFace as a small hack to implement the special_mask needed for CRF.

`util/utilfunctions.py`

class Config

Configuration class for the training hyperparameters.

class CPU_Unpickler

A class that loads the tensors in the CPU.

PreviousTraining NextContributing Guide

Last updated 4 months ago

inference/linker.py

class EntityLinker

Initialization Parameters

Calling Parameters

class FrenchEntityLinker

inference/evaluator.py

class Evaluator(EntityLinker)

Initialization Parameters

util/transformersCRF.py

class CRF(nn.Module)

class AutoModelForCrfPretrainedConfig(PretrainedConfig)

class AutoModelCrfForNer(PreTrainedModel)

class BERT_CRF_Config(PretrainedConfig)

class BertCrfForNer(PreTrainedModel)

Forward Parameters

class ROBERTA_CRF_Config(PretrainedConfig)

class RobertaCrfForNer(PreTrainedModel)

Forward Parameters

class DEBERTA_CRF_Config(PretrainedConfig)

class DebertaCrfForNer(PreTrainedModel)

Forward Parameters

util/utilfunctions.py

class Config

class CPU_Unpickler

inference/linker.`py`

`inference/evaluator.py`

`util/transformersCRF.py`

`util/utilfunctions.py`