> For the complete documentation index, see [llms.txt](https://docs.tabiya.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.tabiya.org/tabiya-documentation/es/nuestra-pila-tecnologica/livelihoods-classifier/training.md).

# Entrenamiento

Entrena tu modelo de extracción de entidades usando PyTorch.

Primero, active el entorno virtual como se explicó [aquí](/tabiya-documentation/es/nuestra-pila-tecnologica/livelihoods-classifier/getting-started.md#dep).

### Entrenar un Modelo de Extracción de Entidades

Configura los hiperparámetros necesarios en el archivo config.json. Los valores por defecto son:

```json
{
    "model_name": "bert-base-cased",
    "crf": false,
    "dataset_path": "tabiya/job_ner_dataset",   
    "label_list": ["O", "B-Skill", "B-Qualification", "I-Domain", "I-Experience", "I-Qualification", "B-Occupation", "B-Domain", "I-Occupation", "I-Skill", "B-Experience"],
    "model_max_length": 128,
    "batch_size": 32,
    "learning_rate": 1e-4,
    "epochs": 4,
    "weight_decay": 0.01,
    "save": false,
    "output_path": "bert_job_ner"
}
```

Para entrenar el modelo, ejecuta el siguiente script en el `train` directorio:

```sh
python train.py
```

El script de entrenamiento se basa en el [tutorial oficial de HuggingFace sobre clasificación de tokens](https://huggingface.co/docs/transformers/en/tasks/token_classification).

### Entrenar un Modelo de Similitud de Entidades

Configura los hiperparámetros necesarios en la `sbert_train` función en el archivo sbert\_train.py:

```python
sbert_train(model_id='all-MiniLM-L6-v2', dataset_path='your/dataset/path', output_path='your/output/path')
```

Para entrenar el modelo de similitud, ejecuta el siguiente script en el `train` directorio:

```sh
python sbert_train.py
```

El conjunto de datos debe estar formateado como un archivo CSV con dos columnas, como 'title' y 'esco\_label', donde cada fila contiene un par de puntos de datos textuales relacionados para usarse durante el proceso de entrenamiento. Asegúrate de que no haya valores faltantes en tu conjunto de datos para garantizar un entrenamiento exitoso del modelo. Aquí hay un ejemplo de cómo podría verse tu archivo CSV:

| título                       | esco\_label                     |
| ---------------------------- | ------------------------------- |
| Gerente Senior de Conflictos | director de institución pública |
| etc                          | etc                             |

Se puede encontrar más información [aquí](/tabiya-documentation/es/nuestra-pila-tecnologica/livelihoods-classifier/datasets.md#entity-similarity).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.tabiya.org/tabiya-documentation/es/nuestra-pila-tecnologica/livelihoods-classifier/training.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.