The analysis of monitoring data is extremely valuable for critical computer systems. It allows to gain insights into the failure behavior of a given system under real workload conditions, which is crucial to assure service continuity and downtime reduction. This paper proposes an experimental evaluation of different direct monitoring techniques, namely event logs, assertions, and source code instrumentation, that are widely used in the context of critical industrial systems. We inject 12,733 software faults in a real-world air traffic control (ATC) middleware system with the aim of analyzing the ability of mentioned techniques to produce information in case of failures. Experimental results indicate that each technique is able to cover a limited number of failure manifestations. Moreover, we observe that the quality of collected data to support failure diagnosis tasks strongly varies across the techniques considered in this study.
Assessing Direct Monitoring Techniques to Analyze Failures of Critical Industrial Systems / Cinque, Marcello; Cotroneo, Domenico; DELLA CORTE, Raffaele; Pecchia, Antonio. - (2014), pp. 212-222. (Intervento presentato al convegno 25th IEEE International Symposium on Software Reliability Engineering, ISSRE 2014, Naples, Italy, November 3-6, 2014 tenutosi a Naples nel 2014) [10.1109/ISSRE.2014.30].
Assessing Direct Monitoring Techniques to Analyze Failures of Critical Industrial Systems
CINQUE, MARCELLO;COTRONEO, DOMENICO;DELLA CORTE, RAFFAELE;PECCHIA, ANTONIO
2014
Abstract
The analysis of monitoring data is extremely valuable for critical computer systems. It allows to gain insights into the failure behavior of a given system under real workload conditions, which is crucial to assure service continuity and downtime reduction. This paper proposes an experimental evaluation of different direct monitoring techniques, namely event logs, assertions, and source code instrumentation, that are widely used in the context of critical industrial systems. We inject 12,733 software faults in a real-world air traffic control (ATC) middleware system with the aim of analyzing the ability of mentioned techniques to produce information in case of failures. Experimental results indicate that each technique is able to cover a limited number of failure manifestations. Moreover, we observe that the quality of collected data to support failure diagnosis tasks strongly varies across the techniques considered in this study.File | Dimensione | Formato | |
---|---|---|---|
Assessing_Direct_Monitoring_Techniques_to_Analyze_Failures_of_Critical_Industrial_Systems.pdf
solo utenti autorizzati
Tipologia:
Versione Editoriale (PDF)
Licenza:
Copyright dell'editore
Dimensione
809.67 kB
Formato
Adobe PDF
|
809.67 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.