In many classification tasks involving sequential or streaming data, acquiring labeled observations is costly or time-consuming, making it infeasible to fully label all instances. This limitation is particularly evident in industrial process monitoring, where determining whether the state of a process is in control or out of control often requires expert intervention or specialized testing procedures. To address this challenge, we propose a stream-based active learning framework to support classification in scenarios where only a limited number of labels can be obtained. The method integrates partially hidden Markov models (pHMMs), effectively capturing temporal dependencies while combining labeled and unlabeled data for probabilistic classification. At each time step, the active learning framework evaluates incoming observations and decides whether to request a label, using a novel dual criterion that balances exploitation (refining classification boundaries) and exploration (identifying new or unknown process states). An online fitting strategy is developed for updating the pHMM over time, including a robust initialization procedure specifically tailored for highly imbalanced classification settings, common in quality monitoring applications where most data reflect nominal operation. The proposed active learning framework also incorporates a model selection mechanism to dynamically determine the number of latent states in the process. The proposed framework’s performance is evaluated through an extensive simulation study and applied to a case study involving resistance spot welding in the automotive industry. In this case study, process profiles are continuously collected, and labels are selectively obtained through ultrasonic inspection. Acknowledgements: The research activity of C. Capezza and A. Lepore was supported by Piano Nazionale di Ripresa e Resilienza (PNRR) - Missione 5 Componente 2, Investimento 1.3-D.D. 1551.11-10-2022, PE00000004 within the Extended Partnership MICS (Made in Italy - Circular and Sustainable). This manuscript reflects only the authors’ views and opinions, neither the European Union nor the European Commission can be considered responsible for them.
Active Learning For Sequential Classification With Partial Labels / Capezza, Christian; Lepore, Antonio; Paynabar, Kamran. - (2025), pp. 207-207. ( 15th Scientific Meeting of the Classification and Data Analysis Group 1st International Scientific Joint Meeting of the Italian and Dutch/Flemish Classification Societies Napoli, Italia 8-10 September 2025).
Active Learning For Sequential Classification With Partial Labels
Christian Capezza
;Antonio Lepore;Kamran Paynabar
2025
Abstract
In many classification tasks involving sequential or streaming data, acquiring labeled observations is costly or time-consuming, making it infeasible to fully label all instances. This limitation is particularly evident in industrial process monitoring, where determining whether the state of a process is in control or out of control often requires expert intervention or specialized testing procedures. To address this challenge, we propose a stream-based active learning framework to support classification in scenarios where only a limited number of labels can be obtained. The method integrates partially hidden Markov models (pHMMs), effectively capturing temporal dependencies while combining labeled and unlabeled data for probabilistic classification. At each time step, the active learning framework evaluates incoming observations and decides whether to request a label, using a novel dual criterion that balances exploitation (refining classification boundaries) and exploration (identifying new or unknown process states). An online fitting strategy is developed for updating the pHMM over time, including a robust initialization procedure specifically tailored for highly imbalanced classification settings, common in quality monitoring applications where most data reflect nominal operation. The proposed active learning framework also incorporates a model selection mechanism to dynamically determine the number of latent states in the process. The proposed framework’s performance is evaluated through an extensive simulation study and applied to a case study involving resistance spot welding in the automotive industry. In this case study, process profiles are continuously collected, and labels are selectively obtained through ultrasonic inspection. Acknowledgements: The research activity of C. Capezza and A. Lepore was supported by Piano Nazionale di Ripresa e Resilienza (PNRR) - Missione 5 Componente 2, Investimento 1.3-D.D. 1551.11-10-2022, PE00000004 within the Extended Partnership MICS (Made in Italy - Circular and Sustainable). This manuscript reflects only the authors’ views and opinions, neither the European Union nor the European Commission can be considered responsible for them.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


