Tabiya Documentation
HomepageGithub
🇬🇧 English
  • Tabiya Documentation
🇬🇧 English
  • Welcome
  • Overview
    • About Tabiya
    • The Global Youth Employment Challenge
      • The Role of Labor Market Intermediation
      • Digital Platforms and AI in LMIC Labor Market Intermediation
  • Open-Source Tech for Labor Markets
  • Our Tech Stack
    • Inclusive Livelihoods Taxonomy
      • Methodology
      • Why ESCO?
      • Core Taxonomy
      • Open Taxonomy Platform
      • Taxonomy CSV Format
    • Livelihoods Classifier
      • Getting Started
      • Web Application
      • Datasets
      • Training
      • Advanced Topics
      • Contributing Guide
      • FAQs
      • Demo Video
    • Compass
      • Technical Overview
      • UX Evaluation
        • UX Testing Discussion Guide
      • Roadmap
Powered by GitBook
On this page
  • Reference Sets
  • Training Sets
  • Evaluation Sets
Export as PDF
  1. Our Tech Stack
  2. Livelihoods Classifier

Datasets

PreviousWeb ApplicationNextTraining

Last updated 2 months ago

Reference Sets

Occupations

  • Location: inference/files/occupations_augmented.csv

  • Source:

  • Description: ESCO (European Skills, Competences, Qualifications and Occupations) is the European multilingual classification of Skills, Competences, and Occupations. This dataset includes information relevant to the occupations.

  • License: Creative Commons Attribution 4.0 International see DATA_LICENSE for details.

  • Modifications: The columns retained are alt_label, preferred_label, esco_code, and uuid. Each alternative label has been separated into individual rows.

Skills

  • Location: inference/files/skills.csv

  • Source:

  • Description: ESCO (European Skills, Competences, Qualifications and Occupations) is the European multilingual classification of Skills, Competences and Occupations. This dataset includes information relevant to the skills.

  • License: Creative Commons Attribution 4.0 International see Data License for details.

  • Modifications: The columns retained are preferred_label and uuid.

Qualifications

  • Location: inference/files/qualifications.csv

  • Source:

  • Description: This dataset contains EQF (European Qualifications Framework) relevant information extracted from the official EQF comparison website. It includes data strings, country information, and EQF levels. Non-English text was ignored.

  • License: Please refer to the original source for .

  • Modifications: Non-English text was removed, and the remaining information was formatted into a structured database.

For the French version of the tool, we use the French version of ESCO v1.1.1, as well as, a translation of the qualifications, using the Google Translation API.

Training Sets

Entity Extraction

  • Description: This dataset provides a comprehensive benchmark suite for Entity Recognition (ER) in job descriptions. Developed to fill the significant gap in resources for extracting key entities like skills from job descriptions, the dataset features 18.6k annotated entities across five categories: Skill, Qualification, Experience, Occupation, and Domain.

  • License: CC-BY-NC-4.0

  • Modifications: No modifications were made to the original dataset. It was only converted to HuggingFace format.

Entity Similarity

  • Location: TBD

  • Description:

    The hahu_test.csv file is the original file provided by Hahu Jobs with the following fields:

    • title: The title of the job position, indicating the specific role and/or position within the organization.

    • esco_label: The preferred or alternative label provided by ESCO, matching the corresponding ESCO code.

    • esco_code: The ESCO code associated with the job, facilitating standardized classification and comparison across different job listings.

  • License: CC-BY-NC-4.0

  • Modifications: Extracted Occupation title and relevant ESCO code and matched with preferred and alternative labels.

Evaluation Sets

Hahu Test

  • Location: inference/files/eval/redacted_hahu_test_with_id.csv

  • Description: This dataset consists of 542 entries chosen at random from the 11 general classification system of the Ethiopian hahu jobs platform. 50 entries were selected from each class to create the final dataset.

  • License: Creative Commons Attribution 4.0 International see Data License for details.

  • Modifications: No modifications were made to the selected entries.

House and Tech

  • Location:

    • inference/files/eval/house_test_annotations.csv

    • inference/files/eval/house_validation_annotations.csv

    • inference/files/eval/tech_test_annotations.csv

    • inference/files/eval/tech_validation_annotations.csv

  • Description: The dataset includes the HOUSE and TECH extensions of the SkillSpan Dataset. In the original work by Decorte et al., the test and development entities of the SkillSpan Dataset were annotated into the ESCO model.

  • License: MIT, Please refer to the original source.

  • Modifications: The datasets were used as provided without further modifications.

Qualification Mapping

  • Location: inference/files/eval/qualification_mapping.csv

  • Description: This dataset maps the Green Benchmark Qualifications to the appropriate EQF levels. Two annotators tagged the qualifications, resulting in a Cohen's Kappa agreement of 0.45, indicating moderate agreement.

  • License: Creative Commons Attribution 4.0 International see Data License for details.

  • Modifications: Extended the dataset to include EQF level mappings and the annotations were verified by two annotators.

Access and Usage

To use these datasets, ensure you comply with the original dataset's license and terms of use. Any modifications made should be documented and attributed appropriately to your project.

For datasets requiring access tokens, such as those from HuggingFace 🤗, please contact the maintainers.

Location:

Source:

Source:

Source:

Source: Provided by

Source: Extended from the Qualifications

ESCO dataset - v1.1.1
ESCO dataset - v1.1.1
Official European Union EQF comparison website
license information
job_ner_dataset
Green Benchmark corpus
hahu-occupation-titles
hahu_test
Decorte et al.
Green Benchmark