I am using part of the iris data set to get a better understanding of PCA.
here's my code :
from sklearn.datasets import load_iris
import numpy as np
import matplotlib.pyplot as plt
from sklearn import decomposition
dataset = load_iris()
X = dataset.data[:20,]
pca = decomposition.PCA(n_components=4)
pca.fit(X)
X = pca.transform(X)
print(X)
print()
print(pca.explained_variance_ratio_)
print(pca.explained_variance_)
print(pca.noise_variance_)
print()
print(pca.components_)
print()
pca = decomposition.PCA(n_components=3)
pca.fit(X)
X = pca.transform(X)
print(X)
print()
print(pca.explained_variance_ratio_)
print(pca.explained_variance_)
print(pca.noise_variance_)
print()
print(pca.components_)
print()
pca = decomposition.PCA(n_components=2)
pca.fit(X)
X = pca.transform(X)
print(X)
print()
print(pca.explained_variance_ratio_)
print(pca.explained_variance_)
print(pca.noise_variance_)
print()
print(pca.components_)
print()
pca = decomposition.PCA(n_components=1)
pca.fit(X)
X = pca.transform(X)
print(X)
print()
print(pca.explained_variance_ratio_)
print(pca.explained_variance_)
print(pca.noise_variance_)
print()
print(pca.components_)
print()
Output:
| F1 | F2 | F3 | F4 | Label |
|5.1 |3.5 |1.4 |0.2 | 0 |
|4.9 |3.0 |1.4 |0.2 | 0 |
|4.7 |3.2 |1.3 |0.2 | 0 |
|4.6 |3.1 |1.5 |0.2 | 0 |
|5.0 |3.6 |1.4 |0.2 | 0 |
|5.4 |3.9 |1.7 |0.4 | 0 |
|4.6 |3.4 |1.4 |0.3 | 0 |
|5.0 |3.4 |1.5 |0.2 | 0 |
|4.4 |2.9 |1.4 |0.2 | 0 |
|4.9 |3.1 |1.5 |0.1 | 0 |
|5.4 |3.7 |1.5 |0.2 | 0 |
|4.8 |3.4 |1.6 |0.2 | 0 |
|4.8 |3.0 |1.4 |0.1 | 0 |
|4.3 |3.0 |1.1 |0.1 | 0 |
|5.8 |4.0 |1.2 |0.2 | 0 |
|5.7 |4.4 |1.5 |0.4 | 0 |
|5.4 |3.9 |1.3 |0.4 | 0 |
|5.1 |3.5 |1.4 |0.3 | 0 |
|5.7 |3.8 |1.7 |0.3 | 0 |
|5.1 |3.8 |1.5 |0.3 | 0 |
[[ -5.35882132e-02 2.13091549e-02 5.63776995e-02 2.38909674e-02]
[ 4.31102885e-01 2.27802156e-01 7.74776903e-02 -8.56077547e-02]
[ 4.46437821e-01 -6.48981661e-02 7.80252213e-02 -2.16463511e-02]
[ 5.70213598e-01 1.37832371e-02 -1.17201913e-01 -2.27730577e-03]
[ -4.99837824e-02 -1.06433448e-01 1.11801355e-02 6.42148516e-02]
[ -5.88493547e-01 1.19234918e-02 -2.42112963e-01 -4.46036896e-02]
[ 3.62588639e-01 -2.42562846e-01 -9.89230051e-02 -3.13366123e-02]
[ 7.83136388e-02 6.27754417e-02 -4.79067754e-02 2.65736478e-02]
[ 8.58395527e-01 -1.49295381e-02 -5.29428852e-02 -4.69710396e-02]
[ 3.65880852e-01 2.20160693e-01 -4.51271386e-03 5.21066893e-02]
[ -4.13586321e-01 1.11767646e-01 2.13883619e-02 5.54246013e-02]
[ 2.13819922e-01 -2.35008745e-02 -1.97388814e-01 6.95802124e-02]
[ 5.14034854e-01 1.87196747e-01 7.30881295e-02 2.14166399e-02]
[ 8.97493973e-01 -2.33177183e-01 1.99567657e-01 3.71580447e-02]
[ -8.81108056e-01 4.91145021e-02 3.63511477e-01 3.42164603e-02]
[ -1.12874867e+00 -2.07254026e-01 -5.20579454e-02 1.83622028e-02]
[ -5.55989247e-01 -1.36936973e-01 1.21657674e-01 -1.11349149e-01]
[ -6.47040031e-02 1.68848098e-04 3.14975704e-02 -6.99733273e-02]
[ -7.24614545e-01 2.84297834e-01 -1.13495890e-01 -1.73834789e-02]
[ -2.77465322e-01 -1.60606696e-01 -1.07228711e-01 2.82043907e-02]]
[ 0.87954353 0.06300167 0.05039505 0.00705974]
[ 0.31612993 0.02264438 0.01811324 0.00253745]
0.0
[[-0.71816179 -0.68211748 -0.08126075 -0.1111579 ]
[ 0.61745716 -0.65996887 0.37215116 -0.21140307]
[ 0.2926969 -0.15927874 -0.90942659 -0.24880129]
[-0.131601 0.27163784 0.16686365 -0.93864295]]
[[ -5.35882132e-02 2.13091549e-02 -5.63776995e-02]
[ 4.31102885e-01 2.27802156e-01 -7.74776903e-02]
[ 4.46437821e-01 -6.48981661e-02 -7.80252213e-02]
[ 5.70213598e-01 1.37832371e-02 1.17201913e-01]
[ -4.99837824e-02 -1.06433448e-01 -1.11801355e-02]
[ -5.88493547e-01 1.19234918e-02 2.42112963e-01]
[ 3.62588639e-01 -2.42562846e-01 9.89230051e-02]
[ 7.83136388e-02 6.27754417e-02 4.79067754e-02]
[ 8.58395527e-01 -1.49295381e-02 5.29428852e-02]
[ 3.65880852e-01 2.20160693e-01 4.51271386e-03]
[ -4.13586321e-01 1.11767646e-01 -2.13883619e-02]
[ 2.13819922e-01 -2.35008745e-02 1.97388814e-01]
[ 5.14034854e-01 1.87196747e-01 -7.30881295e-02]
[ 8.97493973e-01 -2.33177183e-01 -1.99567657e-01]
[ -8.81108056e-01 4.91145021e-02 -3.63511477e-01]
[ -1.12874867e+00 -2.07254026e-01 5.20579454e-02]
[ -5.55989247e-01 -1.36936973e-01 -1.21657674e-01]
[ -6.47040031e-02 1.68848098e-04 -3.14975704e-02]
[ -7.24614545e-01 2.84297834e-01 1.13495890e-01]
[ -2.77465322e-01 -1.60606696e-01 1.07228711e-01]]
[ 0.87954353 0.06300167 0.05039505]
[ 0.31612993 0.02264438 0.01811324]
0.00253744874373
[[ 1.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[ -0.00000000e+00 1.00000000e+00 -3.33066907e-15 0.00000000e+00]
[ 0.00000000e+00 -3.10862447e-15 -1.00000000e+00 -3.60822483e-16]]
[[ -5.35882132e-02 2.13091549e-02]
[ 4.31102885e-01 2.27802156e-01]
[ 4.46437821e-01 -6.48981661e-02]
[ 5.70213598e-01 1.37832371e-02]
[ -4.99837824e-02 -1.06433448e-01]
[ -5.88493547e-01 1.19234918e-02]
[ 3.62588639e-01 -2.42562846e-01]
[ 7.83136388e-02 6.27754417e-02]
[ 8.58395527e-01 -1.49295381e-02]
[ 3.65880852e-01 2.20160693e-01]
[ -4.13586321e-01 1.11767646e-01]
[ 2.13819922e-01 -2.35008745e-02]
[ 5.14034854e-01 1.87196747e-01]
[ 8.97493973e-01 -2.33177183e-01]
[ -8.81108056e-01 4.91145021e-02]
[ -1.12874867e+00 -2.07254026e-01]
[ -5.55989247e-01 -1.36936973e-01]
[ -6.47040031e-02 1.68848098e-04]
[ -7.24614545e-01 2.84297834e-01]
[ -2.77465322e-01 -1.60606696e-01]]
[ 0.88579703 0.06344961]
[ 0.31612993 0.02264438]
0.0181132415475
[[ 1.00000000e+00 0.00000000e+00 0.00000000e+00]
[ -0.00000000e+00 1.00000000e+00 -5.55111512e-16]]
[[-0.05358821]
[ 0.43110288]
[ 0.44643782]
[ 0.5702136 ]
[-0.04998378]
[-0.58849355]
[ 0.36258864]
[ 0.07831364]
[ 0.85839553]
[ 0.36588085]
[-0.41358632]
[ 0.21381992]
[ 0.51403485]
[ 0.89749397]
[-0.88110806]
[-1.12874867]
[-0.55598925]
[-0.064704 ]
[-0.72461455]
[-0.27746532]]
[ 0.93315793]
[ 0.31612993]
0.0226443764968
[[ 1. 0.]]
In my dataset, F1 has the highest variance. How is this visible in the output of the PCA?
What exactly does "explained variance" mean here? Does this mean how much the original feature influenced the variance of the newly calculated values?
Why is the noise variance 0 for the first example with 4 components?
What exactly are the components_
? Are they the n-dimensional eigenvectors?
F1 has the highest variance. How is this visible in the output of the PCA?
PCA is feature transformation technique that rotates your original data dimensions and converts to to a new orthonormal feature space. In the new feature space, the principal components (the orthonormal eigenvectors of the z-score-normalized covariance matrix of your data) form the dimensions of the space. These components are the linear combinations of your original feature dimensions. Consider the following code, the dominant principal component PC1 (capturing the highest variance in the data) can be represented as a linear combinations of the features as PC1=-0.718162*F1+0.292697*F3-0.131601*F4
.
import pandas as pd
pd.DataFrame(pca.components_, columns=['PC1', 'PC2', 'PC3', 'PC4'], index=['F1', 'F2', 'F3', 'F4'])
# PC1 PC2 PC3 PC4
#F1 -0.718162 -0.682117 -0.081261 -0.111158
#F2 0.617457 -0.659969 0.372151 -0.211403
#F3 0.292697 -0.159279 -0.909427 -0.248801
#F4 -0.131601 0.271638 0.166864 -0.938643
What exactly does "explained variance" mean here? Does this mean how much the original feature influenced the variance of the newly calculated values?
The amount of variance explained by each of the selected components, which is obtained by simply taking the variance of the PCA loadings columns (the variances of the columns returns by pca.trandsform
, i.e., the variance of the transformed features, not the original ones), see the following code:
X = pca.transform(X)
print(np.var(X, axis=0))
#[ 0.31612993 0.02264438 0.01811324 0.00253745]
print(pca.explained_variance_)
#[ 0.31612993 0.02264438 0.01811324 0.00253745]
Why is the noise variance 0 for the first example with 4 components?
Because we have not performed any dimension reduction in the first case, we have just transformed the feature space into another and used all the 4 components, did not exclude any (so no information is lost).
What exactly are the components_? Are they the n-dimensional eigenvectors?
The components can be thought of as orthonormal eigenvectors of the covariance matrix of the scaled data, although as the documentation says it's calculated in a more numerically stable way using Singular Value Decomposition, in which case they are computed from the right singular vectors.