feat: add Job Title Similarity ranking task#28
Conversation
Mattdl
left a comment
There was a problem hiding this comment.
Thanks @federetyk, super valuable contribution!
Code looks good to me. Would just add some clarifications in the JobSImilarity task as it is the first task of its kind (see review).
| """ | ||
| Job Title Similarity ranking task based on Zbib et al. (2022) and Deniz et al. (2024). | ||
|
|
||
| Predict similar job titles from the datasets presented in the aforementioned papers. |
There was a problem hiding this comment.
Would be great to give a bit more context here on the JobTitleSimilarityTask.
- Provide a link to the huggingface repo, and shortly how it is used (corpus and query sets) and languages it covers.
- How this task differs from JobNormalization (for each job title you have multiple similar job titles, and others are deemed non-similar), whereas jobNormalization maps to a single best-matching canonical job title.
- Give an example of a query and labels.
There was a problem hiding this comment.
Additionally, to create visibilitiy, would add your dataset entry in the README.md table of datasets (along with nb targets x nb queries).
@Mattdl Thanks for the review! I have updated the docstring with the requested context and added the entry to the README table as suggested. |
Mattdl
left a comment
There was a problem hiding this comment.
Looks great, ready to merge.
closes #24
Description
Added the Job Title Similarity dataset (Avature/Job-Title-Similarity) as a new ranking task in WorkRB. This task evaluates a model's ability to rank job titles by semantic similarity to a query job title. The dataset includes 11 languages (en, de, es, fr, it, ja, ko, nl, pl, pt, zh) with ~105 queries and ~2,500 corpus job titles per language.
Changes:
RankingTaskGroup.JOBSIMinsrc/workrb/types.pyLanguageenum insrc/workrb/tasks/abstract/ranking_base.pyJobTitleSimilarityRankingtask class insrc/workrb/tasks/ranking/job_similarity.pysrc/workrb/tasks/__init__.pyandsrc/workrb/tasks/ranking/__init__.pytests/test_task_loading.pyTask characteristics:
References:
Checklist