When considering popular On-line Social Networks (OSN) containing heterogeneous multimedia data sources, the complexity of the underlying processing systems becomes challenging, and requires to implement application-specific but still comprehensive benchmarking. The variety of big data architectures (and of their possible realization) for both batch and streaming processing in a huge number of application domains, makes the benchmarking of these systems critical for both academic and industrial communities. In this work, we evaluate the performance of two state-of-art big data architectures, namely Lambda and Kappa, considering OSN data analysis as reference task. In more details, we have implemented and deployed an influence analysis algorithm on the Microsoft Azure public cloud platform to investigate the impact of a number of factors on the performance obtained by cloud users. These factors comprise the type of the implemented architecture, the volume of the data to analyze, the size of the cluster of nodes realizing the architectures and their characteristics, the deployment costs, as well as the quality of the output when the analysis is subjected to strict temporal deadlines. Experimental campaigns have been carried out on the Yahoo Flickr Creative Commons 100 Million (YFCC100M). Reported results and discussions show that Lambda outperforms Kappa architecture for the class of problems investigated. Providing a variety of analyses – e.g., also investigating the impact of dataset size, scaling, cost – this paper provides useful insights on the performance of these state-of-art big data architectures that are helpful to both experts and newcomers interested in deploying big data architectures leveraging cloud platforms.
Benchmarking big data architectures for social networks data processing using public cloud platforms / Persico, Valerio; Pescapé, Antonio; Picariello, Antonio; Sperlí, Giancarlo. - In: FUTURE GENERATION COMPUTER SYSTEMS. - ISSN 0167-739X. - 89:(2018), pp. 98-109. [10.1016/j.future.2018.05.068]
Benchmarking big data architectures for social networks data processing using public cloud platforms
Persico, Valerio
;Pescapé, Antonio;Picariello, Antonio;Sperlí, Giancarlo
2018
Abstract
When considering popular On-line Social Networks (OSN) containing heterogeneous multimedia data sources, the complexity of the underlying processing systems becomes challenging, and requires to implement application-specific but still comprehensive benchmarking. The variety of big data architectures (and of their possible realization) for both batch and streaming processing in a huge number of application domains, makes the benchmarking of these systems critical for both academic and industrial communities. In this work, we evaluate the performance of two state-of-art big data architectures, namely Lambda and Kappa, considering OSN data analysis as reference task. In more details, we have implemented and deployed an influence analysis algorithm on the Microsoft Azure public cloud platform to investigate the impact of a number of factors on the performance obtained by cloud users. These factors comprise the type of the implemented architecture, the volume of the data to analyze, the size of the cluster of nodes realizing the architectures and their characteristics, the deployment costs, as well as the quality of the output when the analysis is subjected to strict temporal deadlines. Experimental campaigns have been carried out on the Yahoo Flickr Creative Commons 100 Million (YFCC100M). Reported results and discussions show that Lambda outperforms Kappa architecture for the class of problems investigated. Providing a variety of analyses – e.g., also investigating the impact of dataset size, scaling, cost – this paper provides useful insights on the performance of these state-of-art big data architectures that are helpful to both experts and newcomers interested in deploying big data architectures leveraging cloud platforms.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.