Comparison of K-Means and Hierarchical Clustering Methods for Buffalo Milk Production Data

Trapanese, Lucia; Bifulco, Giovanna; Santinello, Matteo; Pasquino, Nicola; Campanile, Giuseppe; Salzano, Angela

doi:10.3390/ani15223246

This study investigated the use of K-means and hierarchical clustering, to group Italian Mediterranean buffalo using routinely collected test-day records. The analysis was first conducted on a combined dataset comprising three buffalo herds and subsequently on each herd individually. The main objective was to determine whether data-driven groupings could be implemented to support improvements in general herd management strategies. Results indicated that K-means consistently outperformed hierarchical clustering across all datasets, as reflected by average silhouette scores (0.17–0.18 vs. 0.10–0.12 for K-means and hierarchical, respectively), favorable Davies–Bouldin Index (DBI; 2.05–2.16 vs. 2.11–2.5 for K-means and hierarchical, respectively) and Calinski–Harabasz Index values (CHI; 1034–3877 vs. 729–2109 for K-means and hierarchical, respectively). K-means identified two clusters in the combined dataset and in two of the three herds, while three clusters were identified in the remaining herd. Cluster composition analysis revealed that days in milk and milk yield were the main discriminating factors when two clusters were formed. When three clusters emerged, K-means also identified a subgroup of animals that differed from the others in both age and lactation stage. These findings were supported by the analysis of variance (ANOVA), which showed statistically significant differences among most of the evaluated variables.

Comparison of K-Means and Hierarchical Clustering Methods for Buffalo Milk Production Data / Trapanese, L., Bifulco, G., Santinello, M., Pasquino, N., Campanile, G., Salzano, A.. - In: ANIMALS. - ISSN 2076-2615. - 15:22(2025). [10.3390/ani15223246]