Due to the inadequacy of standard clustering approaches for handling extensive data, considerable research has recently focused on clustering large and extremely large datasets. Specifically, certain variations of the famous fuzzy C-Means algorithm have been put forth, testing techniques for segmenting datasets and aggregating the intermediate clustered results. Among them, the Fuzzy C-Means online technique is one of the most used for clustering large amounts of data. It splits the dataset into equal-sized subsets, or chunks, and assigns a weight to each chunk depending on the membership degrees per cluster. This study introduces a novel variation of the Online Fuzzy C-Means (OFCM) algorithm designed to boost its performance. Our proposed method integrates a cluster compactness measure into the weight attribution process, quantified by the fuzzy entropy of each cluster. Comparative experiments, conducted across diverse classification datasets of varying scales, demonstrate that the proposed algorithm significantly improves the accuracy of clustering results when compared to the standard OFCM. Crucially, this enhancement is achieved without increasing the computational complexity of the algorithm. Furthermore, our approach yields performance comparable to that of heuristic Fuzzy C-Means algorithms, while offering the distinct advantage of shorter execution times. Future research will focus on exploring feature selection and reduction techniques to adapt the proposed algorithm for effective application to massive datasets characterized by an exceptionally high number of features
A novel fuzzy‑entropy based online fuzzy C‑Means clustering algorithm for massive data / Cardone, Barbara; Di Martino, Ferdinando. - In: EVOLUTIONARY INTELLIGENCE. - ISSN 1864-5909. - 18:86(2025). [10.1007/s12065-025-01076-0]
A novel fuzzy‑entropy based online fuzzy C‑Means clustering algorithm for massive data
barbara cardone;ferdinando di martino
2025
Abstract
Due to the inadequacy of standard clustering approaches for handling extensive data, considerable research has recently focused on clustering large and extremely large datasets. Specifically, certain variations of the famous fuzzy C-Means algorithm have been put forth, testing techniques for segmenting datasets and aggregating the intermediate clustered results. Among them, the Fuzzy C-Means online technique is one of the most used for clustering large amounts of data. It splits the dataset into equal-sized subsets, or chunks, and assigns a weight to each chunk depending on the membership degrees per cluster. This study introduces a novel variation of the Online Fuzzy C-Means (OFCM) algorithm designed to boost its performance. Our proposed method integrates a cluster compactness measure into the weight attribution process, quantified by the fuzzy entropy of each cluster. Comparative experiments, conducted across diverse classification datasets of varying scales, demonstrate that the proposed algorithm significantly improves the accuracy of clustering results when compared to the standard OFCM. Crucially, this enhancement is achieved without increasing the computational complexity of the algorithm. Furthermore, our approach yields performance comparable to that of heuristic Fuzzy C-Means algorithms, while offering the distinct advantage of shorter execution times. Future research will focus on exploring feature selection and reduction techniques to adapt the proposed algorithm for effective application to massive datasets characterized by an exceptionally high number of features| File | Dimensione | Formato | |
|---|---|---|---|
|
s12065-025-01076-0.pdf
accesso aperto
Descrizione: Articolo nella versione editoriale in formato pdf
Tipologia:
Versione Editoriale (PDF)
Licenza:
Dominio pubblico
Dimensione
1.13 MB
Formato
Adobe PDF
|
1.13 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


