Mechanism of Principal Component Analysis
Principal components analysis (PCA) is a popular approach for deriving a low-dimensional set of features from a large set of variables. It is a tool for unsupervised learning and is often used as a dimension reduction technique for regression problems to tackle the curse of dimensionality in datasets. The Curse of Dimensionality The dimensionality of a dataset is the number of attributes or features present in the dataset. As the dimensionality of the problem increases, the probability of adding noise features that are not truly associated with the response increases, leading to a deterioration in the fitted model, and consequently an increased test set error. Thus, higher dimensionality of the dataset exacerbates the risk of overfitting. Even if they are relevant features, the variance incurred in fitting their coefficients may outweigh the reduction in bias that they bring. Thus, the curse of dimensionality includes the role of the bias-variance trade-off and the danger of ...