Principal Component Analysis (PCA) is an eigendecomposition of a properly transformed matrix, then its standard application requires the data set to be complete (no missing entries). Alternative implementations have been proposed in the literature that extends the PCA to incomplete data sets. Recent comparative reviews of PCA algorithms with missings proved regularised iterative PCA algorithm (RPCA) to be effective. In some applications, incomplete data are constantly produced (e.g. process sensor data) and the corresponding data flow is often analysed in chunks (subsets of observations). In this setting, RPCA could be applied to each chunk, with the result that the PCA solutions (and, the imputations) of single chunks are independent from one another. An incremental RPCA implementation is proposed such that the imputation of each new chunk is based on that chunk, and on all the chunks analysed that far. The proposed procedure is compared to batch RPCA considering different data sets and missing data mechanisms. Experimental results show that the incremental approach has an appreciable performance when the data is missing not completely at random, and the first analysed chunks contain sufficient information on the data structure.

Regularised PCA for incremental single imputation of missings / IODICE D'ENZA, Alfonso; Markos, Angelos; Palumbo, Francesco. - (2022). (Intervento presentato al convegno COMPSTAT 2022 tenutosi a Bologna, Italy nel 23-26 August, 2022).

Regularised PCA for incremental single imputation of missings

Alfonso Iodice D Enza
;
Francesco Palumbo
2022

Abstract

Principal Component Analysis (PCA) is an eigendecomposition of a properly transformed matrix, then its standard application requires the data set to be complete (no missing entries). Alternative implementations have been proposed in the literature that extends the PCA to incomplete data sets. Recent comparative reviews of PCA algorithms with missings proved regularised iterative PCA algorithm (RPCA) to be effective. In some applications, incomplete data are constantly produced (e.g. process sensor data) and the corresponding data flow is often analysed in chunks (subsets of observations). In this setting, RPCA could be applied to each chunk, with the result that the PCA solutions (and, the imputations) of single chunks are independent from one another. An incremental RPCA implementation is proposed such that the imputation of each new chunk is based on that chunk, and on all the chunks analysed that far. The proposed procedure is compared to batch RPCA considering different data sets and missing data mechanisms. Experimental results show that the incremental approach has an appreciable performance when the data is missing not completely at random, and the first analysed chunks contain sufficient information on the data structure.
2022
978-90-73592-40-7
Regularised PCA for incremental single imputation of missings / IODICE D'ENZA, Alfonso; Markos, Angelos; Palumbo, Francesco. - (2022). (Intervento presentato al convegno COMPSTAT 2022 tenutosi a Bologna, Italy nel 23-26 August, 2022).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/920980
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact