This paper proposes a new distributed data structure based on binary trees to support k-nearest neighbor queries over very large databases. The indexing structure is distributed across a network of “peers”, where each one hosts a part of the tree and communication among nodes is realized by message passing. The advantages of this kind of approach are mainly two: it is possible to (i) handle a larger number of nodes and points than a single peer based architecture and (ii) to manage in an efficient way computation of multiple queries. In particular, we propose a novel version of the k-nearest neighbor algorithm that is able to start the query in a randomly chosen peer. Preliminary experiments have demonstrated that in about 65% of cases a query, which starts in random node, does not involve the peer containing the root of the tree
Nearest query on distributed binary trees starting from a random node / Gargiulo, Francesco; Amato, Flora; Moscato, Vincenzo; Picariello, Antonio; Sperli', Giancarlo. - 649:(2016), pp. 257-271. (Intervento presentato al convegno 7th International Conference on Knowledge Engineering and Semantic Web, KESW 2016 tenutosi a Prague (Czech Republic) nel September 21-23, 2016) [10.1007/978-3-319-45880-9_20].
Nearest query on distributed binary trees starting from a random node
GARGIULO, FRANCESCO;AMATO, FLORA;MOSCATO, VINCENZO;PICARIELLO, ANTONIO;SPERLI', GIANCARLO
2016
Abstract
This paper proposes a new distributed data structure based on binary trees to support k-nearest neighbor queries over very large databases. The indexing structure is distributed across a network of “peers”, where each one hosts a part of the tree and communication among nodes is realized by message passing. The advantages of this kind of approach are mainly two: it is possible to (i) handle a larger number of nodes and points than a single peer based architecture and (ii) to manage in an efficient way computation of multiple queries. In particular, we propose a novel version of the k-nearest neighbor algorithm that is able to start the query in a randomly chosen peer. Preliminary experiments have demonstrated that in about 65% of cases a query, which starts in random node, does not involve the peer containing the root of the treeI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.