Different data difficulty factors (e.g., class imbalance, class overlapping, presence of outliers and noisy observations and difficult border decisions) make classification tasks challenging in many practical applications and are hot topics in the domain of pattern recognition, machine learning and deep learning. Data complexity factors have been widely discussed in specialized literature from a model-based or a data-based perspective, conversely less research efforts have been devoted to investigating their effect on the behaviour of classifier predictive performance measures. Our study tries to address this issue by investigating the impact of data complexity on the behaviour of several measures of classifier predictive performance. The investigation has been conducted via an extensive study based on numerical experiments using artificial data sets. The data generation process has been controlled through a set of parameters (e.g., number of features; class frequency distributions; frequency distributions of safe and unsafe instances) defining the characteristics of generated data. The artificial data sets have been classified using several algorithms whose predictive performances have been evaluated through the measures under study. Study results highlight that, although the investigated performance measures quite agree for easy classification tasks (i.e., with balanced datasets containing only safe instances), their behaviour significantly differs when dealing with difficult classification tasks (i.e., increasing data complexity) which is a rule in many real-word classification problems.
Evaluating classifier performance in hard classification tasks / Vanacore, Amalia; Ciardiello, Armando. - (2024). ( ENBIS-24 International Conference of the European Network for Business and Industrial Statistics Leuven, Belgium ).
Evaluating classifier performance in hard classification tasks
AMALIA VANACORE
Primo
;ARMANDO CIARDIELLOSecondo
2024
Abstract
Different data difficulty factors (e.g., class imbalance, class overlapping, presence of outliers and noisy observations and difficult border decisions) make classification tasks challenging in many practical applications and are hot topics in the domain of pattern recognition, machine learning and deep learning. Data complexity factors have been widely discussed in specialized literature from a model-based or a data-based perspective, conversely less research efforts have been devoted to investigating their effect on the behaviour of classifier predictive performance measures. Our study tries to address this issue by investigating the impact of data complexity on the behaviour of several measures of classifier predictive performance. The investigation has been conducted via an extensive study based on numerical experiments using artificial data sets. The data generation process has been controlled through a set of parameters (e.g., number of features; class frequency distributions; frequency distributions of safe and unsafe instances) defining the characteristics of generated data. The artificial data sets have been classified using several algorithms whose predictive performances have been evaluated through the measures under study. Study results highlight that, although the investigated performance measures quite agree for easy classification tasks (i.e., with balanced datasets containing only safe instances), their behaviour significantly differs when dealing with difficult classification tasks (i.e., increasing data complexity) which is a rule in many real-word classification problems.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


