Due to the inadequacy of standard clustering approaches for handling extensive data, considerable research has recently focused on clustering large and extremely large datasets. Specifically, certain variations of the famous fuzzy C-Means algorithm have been put forth, testing techniques for segmenting datasets and aggregating the intermediate clustered results. Among them, the Fuzzy C-Means online technique is one of the most used for clustering large amounts of data. It splits the dataset into equal-sized subsets, or chunks, and assigns a weight to each chunk depending on the membership degrees per cluster. This study introduces a novel variation of the Online Fuzzy C-Means (OFCM) algorithm designed to boost its performance. Our proposed method integrates a cluster compactness measure into the weight attribution process, quantified by the fuzzy entropy of each cluster. Comparative experiments, conducted across diverse classification datasets of varying scales, demonstrate that the proposed algorithm significantly improves the accuracy of clustering results when compared to the standard OFCM. Crucially, this enhancement is achieved without increasing the computational complexity of the algorithm. Furthermore, our approach yields performance comparable to that of heuristic Fuzzy C-Means algorithms, while offering the distinct advantage of shorter execution times. Future research will focus on exploring feature selection and reduction techniques to adapt the proposed algorithm for effective application to massive datasets characterized by an exceptionally high number of features

A novel fuzzy‑entropy based online fuzzy C‑Means clustering algorithm for massive data / Cardone, Barbara; Di Martino, Ferdinando. - In: EVOLUTIONARY INTELLIGENCE. - ISSN 1864-5909. - 18:86(2025). [10.1007/s12065-025-01076-0]

A novel fuzzy‑entropy based online fuzzy C‑Means clustering algorithm for massive data

barbara cardone;ferdinando di martino
2025

Abstract

Due to the inadequacy of standard clustering approaches for handling extensive data, considerable research has recently focused on clustering large and extremely large datasets. Specifically, certain variations of the famous fuzzy C-Means algorithm have been put forth, testing techniques for segmenting datasets and aggregating the intermediate clustered results. Among them, the Fuzzy C-Means online technique is one of the most used for clustering large amounts of data. It splits the dataset into equal-sized subsets, or chunks, and assigns a weight to each chunk depending on the membership degrees per cluster. This study introduces a novel variation of the Online Fuzzy C-Means (OFCM) algorithm designed to boost its performance. Our proposed method integrates a cluster compactness measure into the weight attribution process, quantified by the fuzzy entropy of each cluster. Comparative experiments, conducted across diverse classification datasets of varying scales, demonstrate that the proposed algorithm significantly improves the accuracy of clustering results when compared to the standard OFCM. Crucially, this enhancement is achieved without increasing the computational complexity of the algorithm. Furthermore, our approach yields performance comparable to that of heuristic Fuzzy C-Means algorithms, while offering the distinct advantage of shorter execution times. Future research will focus on exploring feature selection and reduction techniques to adapt the proposed algorithm for effective application to massive datasets characterized by an exceptionally high number of features
2025
A novel fuzzy‑entropy based online fuzzy C‑Means clustering algorithm for massive data / Cardone, Barbara; Di Martino, Ferdinando. - In: EVOLUTIONARY INTELLIGENCE. - ISSN 1864-5909. - 18:86(2025). [10.1007/s12065-025-01076-0]
File in questo prodotto:
File Dimensione Formato  
s12065-025-01076-0.pdf

accesso aperto

Descrizione: Articolo nella versione editoriale in formato pdf
Tipologia: Versione Editoriale (PDF)
Licenza: Dominio pubblico
Dimensione 1.13 MB
Formato Adobe PDF
1.13 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/1007059
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact