This paper has been developed in the frame of the European project BLUE-ETS (Economic and Trade Statistics), in the work-package devoted to propose new tools for collecting and analysing data. In order to obtain business information by documentary repositories, here we refer to documents produced with non statistical aims. The use of secondary sources, typical of data and text mining, is an opportunity not sufficiently explored by National Statistical Institutes. NSIs aim at collecting and representing information in a usable and easy-readable way. The use of textual data has been still viewed as too problematic, because of the complexity and the expensiveness of the pre-processing procedures and often for the lack of suitable analytical tools. Our aim is to identify statistical linguistic sources by a deep analysis of one management commentary. From a methodological viewpoint, here we propose a tool for exploring relations between words at a micro-data level, derived from network data analysis, namely ego networks, applied together with lexical correspondence analysis.
Text Mining tools for extracting knowledge from firms annual reports / Balbi, Simona; Stawinoga, AGNIESZKA ELZBIETA; Triunfo, Nicole. - (2012), pp. 67-80. (Intervento presentato al convegno 11 Journées internationales d'analyse statistique de données textuelles tenutosi a Liège, Belgio nel 13-15 giugno 2012).
Text Mining tools for extracting knowledge from firms annual reports
BALBI, SIMONA;STAWINOGA, AGNIESZKA ELZBIETA;TRIUNFO, NICOLE
2012
Abstract
This paper has been developed in the frame of the European project BLUE-ETS (Economic and Trade Statistics), in the work-package devoted to propose new tools for collecting and analysing data. In order to obtain business information by documentary repositories, here we refer to documents produced with non statistical aims. The use of secondary sources, typical of data and text mining, is an opportunity not sufficiently explored by National Statistical Institutes. NSIs aim at collecting and representing information in a usable and easy-readable way. The use of textual data has been still viewed as too problematic, because of the complexity and the expensiveness of the pre-processing procedures and often for the lack of suitable analytical tools. Our aim is to identify statistical linguistic sources by a deep analysis of one management commentary. From a methodological viewpoint, here we propose a tool for exploring relations between words at a micro-data level, derived from network data analysis, namely ego networks, applied together with lexical correspondence analysis.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.