Principal component analysis is a versatile statistical method for reducing a cases-by-variables data table to its essential features, called principal components. Principal components are a few linear combinations of the original variables that maximally explain the variance of all the variables. In the process, the method provides an approximation of the original data table using only these few major components. This Primer presents a comprehensive review of the method’s definition and geometry, as well as the interpretation of its numerical and graphical results. The main graphical result is often in the form of a biplot, using the major components to map the cases and adding the original variables to support the distance interpretation of the cases’ positions. Variants of the method are also treated, such as the analysis of grouped data and categorical data, known as correspondence analysis. Also described and illustrated are the latest innovative applications of principal component analysis: for estimating missing values in huge data matrices, sparse component estimation, and the analysis of images, shapes and functions. Supplementary material includes video animations and computer scripts in the R environment.

Principal component analysis / Greenacre, Michael; Groenen, Patrick J. F.; Hastie, Trevor; IODICE D'ENZA, Alfonso; Markos, Angelos; Tuzhilina, Elena. - In: NATURE REVIEWS METHODS PRIMERS. - ISSN 2662-8449. - 2:1(2022), pp. 1-21. [10.1038/s43586-022-00184-w]

Principal component analysis

Alfonso Iodice D’Enza;
2022

Abstract

Principal component analysis is a versatile statistical method for reducing a cases-by-variables data table to its essential features, called principal components. Principal components are a few linear combinations of the original variables that maximally explain the variance of all the variables. In the process, the method provides an approximation of the original data table using only these few major components. This Primer presents a comprehensive review of the method’s definition and geometry, as well as the interpretation of its numerical and graphical results. The main graphical result is often in the form of a biplot, using the major components to map the cases and adding the original variables to support the distance interpretation of the cases’ positions. Variants of the method are also treated, such as the analysis of grouped data and categorical data, known as correspondence analysis. Also described and illustrated are the latest innovative applications of principal component analysis: for estimating missing values in huge data matrices, sparse component estimation, and the analysis of images, shapes and functions. Supplementary material includes video animations and computer scripts in the R environment.
2022
Principal component analysis / Greenacre, Michael; Groenen, Patrick J. F.; Hastie, Trevor; IODICE D'ENZA, Alfonso; Markos, Angelos; Tuzhilina, Elena. - In: NATURE REVIEWS METHODS PRIMERS. - ISSN 2662-8449. - 2:1(2022), pp. 1-21. [10.1038/s43586-022-00184-w]
File in questo prodotto:
File Dimensione Formato  
01_PCA_nature.pdf

solo utenti autorizzati

Tipologia: Versione Editoriale (PDF)
Licenza: Non specificato
Dimensione 4.31 MB
Formato Adobe PDF
4.31 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/904766
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 372
  • ???jsp.display-item.citation.isi??? 250
social impact