In this paper we address the problem of analyzing biomedical data collection with the purpose of searching for semantic similarity among textual documents. In details, we leverage Word Embeddings models obtained by word2vec algorithm and a specific Big Data architecture for their management, defining an approach able to permit the retrieving of semantic similar texts among a huge biomedical text corpus. The proposed architecture has been developed with the purpose of improving a previous implementation, lowering the computational time and allowing in this way the use of the whole PubMed library as dataset, proving also the usability of this methodology in a real context.
A Big Data Approach for Health Data Information Retrieval / Ciampi, M.; Masciari, E.; De Pietro, G.; Silvestri, S.. - (2019), pp. 2533-2540. (Intervento presentato al convegno 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019 tenutosi a usa nel 2019) [10.1109/BIBM47256.2019.8983302].
A Big Data Approach for Health Data Information Retrieval
Masciari E.
;De Pietro G.;
2019
Abstract
In this paper we address the problem of analyzing biomedical data collection with the purpose of searching for semantic similarity among textual documents. In details, we leverage Word Embeddings models obtained by word2vec algorithm and a specific Big Data architecture for their management, defining an approach able to permit the retrieving of semantic similar texts among a huge biomedical text corpus. The proposed architecture has been developed with the purpose of improving a previous implementation, lowering the computational time and allowing in this way the use of the whole PubMed library as dataset, proving also the usability of this methodology in a real context.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.