Nowadays, recruitment processes are increasingly being automated by intelligent systems which provide best candidates for companies’ open positions, and vice versa. However, extracting information from the unstructured documents involved in these processes (e.g. resumes, jobs’ descriptions) still represents an open challenge because of their high heterogeneity (in the form and style) and the lack of pre-defined standards between different companies and/or countries. In this paper, we address the resume information extraction problem, focusing on documents within the Italian Labor Market. Specifically, we propose an effective and efficient end-to-end framework capable of providing a complete candidate overview including his personal information, skills and work experiences. Specifically, after having extracted the raw data from the resume documents, the system segments them into semantically consistent parts using linguistics patterns. Each segment is further processed with a NER algorithm, based on pre-trained language models, to extract relevant information which an HR specialist could consult in order to assess the suitability of a candidate for a job offer. We collected (and labeled) a new Italian resume dataset and our results prove the effectiveness of the proposed method, especially considering the great advantages our segmentation strategy brings to the NER performance with respect to standard line-based segmentation approaches. In addition, our system achieves promising performance when combined with modern NLP models.
An end-to-end framework for information extraction from Italian resumes / Barducci, 63. A.; Iannaccone, S.; La Gatta, V.; Moscato, V.; Sperli', G.; Zavota, S.. - In: EXPERT SYSTEMS WITH APPLICATIONS. - ISSN 0957-4174. - 210:Article number 118487(2022). [10.1016/j.eswa.2022.118487]
An end-to-end framework for information extraction from Italian resumes
V. La Gatta;V. Moscato;G. Sperli';
2022
Abstract
Nowadays, recruitment processes are increasingly being automated by intelligent systems which provide best candidates for companies’ open positions, and vice versa. However, extracting information from the unstructured documents involved in these processes (e.g. resumes, jobs’ descriptions) still represents an open challenge because of their high heterogeneity (in the form and style) and the lack of pre-defined standards between different companies and/or countries. In this paper, we address the resume information extraction problem, focusing on documents within the Italian Labor Market. Specifically, we propose an effective and efficient end-to-end framework capable of providing a complete candidate overview including his personal information, skills and work experiences. Specifically, after having extracted the raw data from the resume documents, the system segments them into semantically consistent parts using linguistics patterns. Each segment is further processed with a NER algorithm, based on pre-trained language models, to extract relevant information which an HR specialist could consult in order to assess the suitability of a candidate for a job offer. We collected (and labeled) a new Italian resume dataset and our results prove the effectiveness of the proposed method, especially considering the great advantages our segmentation strategy brings to the NER performance with respect to standard line-based segmentation approaches. In addition, our system achieves promising performance when combined with modern NLP models.File | Dimensione | Formato | |
---|---|---|---|
1-s2.0-S095741742201572X-main (1).pdf
solo utenti autorizzati
Descrizione: pdf paper
Tipologia:
Versione Editoriale (PDF)
Licenza:
Copyright dell'editore
Dimensione
1.13 MB
Formato
Adobe PDF
|
1.13 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.