Nowadays, recruitment processes are increasingly being automated by intelligent systems which provide best candidates for companies’ open positions, and vice versa. However, extracting information from the unstructured documents involved in these processes (e.g. resumes, jobs’ descriptions) still represents an open challenge because of their high heterogeneity (in the form and style) and the lack of pre-defined standards between different companies and/or countries. In this paper, we address the resume information extraction problem, focusing on documents within the Italian Labor Market. Specifically, we propose an effective and efficient end-to-end framework capable of providing a complete candidate overview including his personal information, skills and work experiences. Specifically, after having extracted the raw data from the resume documents, the system segments them into semantically consistent parts using linguistics patterns. Each segment is further processed with a NER algorithm, based on pre-trained language models, to extract relevant information which an HR specialist could consult in order to assess the suitability of a candidate for a job offer. We collected (and labeled) a new Italian resume dataset and our results prove the effectiveness of the proposed method, especially considering the great advantages our segmentation strategy brings to the NER performance with respect to standard line-based segmentation approaches. In addition, our system achieves promising performance when combined with modern NLP models.

An end-to-end framework for information extraction from Italian resumes / Barducci, 63. A.; Iannaccone, S.; La Gatta, V.; Moscato, V.; Sperli', G.; Zavota, S.. - In: EXPERT SYSTEMS WITH APPLICATIONS. - ISSN 0957-4174. - 210:Article number 118487(2022). [10.1016/j.eswa.2022.118487]

An end-to-end framework for information extraction from Italian resumes

V. La Gatta;V. Moscato;G. Sperli';
2022

Abstract

Nowadays, recruitment processes are increasingly being automated by intelligent systems which provide best candidates for companies’ open positions, and vice versa. However, extracting information from the unstructured documents involved in these processes (e.g. resumes, jobs’ descriptions) still represents an open challenge because of their high heterogeneity (in the form and style) and the lack of pre-defined standards between different companies and/or countries. In this paper, we address the resume information extraction problem, focusing on documents within the Italian Labor Market. Specifically, we propose an effective and efficient end-to-end framework capable of providing a complete candidate overview including his personal information, skills and work experiences. Specifically, after having extracted the raw data from the resume documents, the system segments them into semantically consistent parts using linguistics patterns. Each segment is further processed with a NER algorithm, based on pre-trained language models, to extract relevant information which an HR specialist could consult in order to assess the suitability of a candidate for a job offer. We collected (and labeled) a new Italian resume dataset and our results prove the effectiveness of the proposed method, especially considering the great advantages our segmentation strategy brings to the NER performance with respect to standard line-based segmentation approaches. In addition, our system achieves promising performance when combined with modern NLP models.
2022
An end-to-end framework for information extraction from Italian resumes / Barducci, 63. A.; Iannaccone, S.; La Gatta, V.; Moscato, V.; Sperli', G.; Zavota, S.. - In: EXPERT SYSTEMS WITH APPLICATIONS. - ISSN 0957-4174. - 210:Article number 118487(2022). [10.1016/j.eswa.2022.118487]
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S095741742201572X-main (1).pdf

solo utenti autorizzati

Descrizione: pdf paper
Tipologia: Versione Editoriale (PDF)
Licenza: Copyright dell'editore
Dimensione 1.13 MB
Formato Adobe PDF
1.13 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/902180
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 22
  • ???jsp.display-item.citation.isi??? 11
social impact