Regularised PCA for incremental single imputation of missings

Iodice D'Enza, Alfonso; Markos, Angelos; Palumbo, Francesco

Principal Component Analysis (PCA) is an eigendecomposition of a properly transformed matrix, then its standard application requires the data set to be complete (no missing entries). Alternative implementations have been proposed in the literature that extends the PCA to incomplete data sets. Recent comparative reviews of PCA algorithms with missings proved regularised iterative PCA algorithm (RPCA) to be effective. In some applications, incomplete data are constantly produced (e.g. process sensor data) and the corresponding data flow is often analysed in chunks (subsets of observations). In this setting, RPCA could be applied to each chunk, with the result that the PCA solutions (and, the imputations) of single chunks are independent from one another. An incremental RPCA implementation is proposed such that the imputation of each new chunk is based on that chunk, and on all the chunks analysed that far. The proposed procedure is compared to batch RPCA considering different data sets and missing data mechanisms. Experimental results show that the incremental approach has an appreciable performance when the data is missing not completely at random, and the first analysed chunks contain sufficient information on the data structure.

Regularised PCA for incremental single imputation of missings / IODICE D'ENZA, A., Markos, A., Palumbo, F.. - (2022). (COMPSTAT 2022 Bologna, Italy 23-26 August, 2022).