Search code examples

PCA doesn't reduce the dimensionality of my data

I would like to apply PCA on heatmaps of 18 dimensions.


Since PCA takes only data of dim <= 2. I reshape my heatmaps as follow :

(50176, 18)

Now, l would to apply PCA and take the first components that preserve 95% of variance.

from sklearn.decomposition import PCA
pca = PCA(n_components=18)

However the dimension of reduced_heatmaps remains the same as the original heatmaps (50176, 18).

My question is as follow : How to reduce the dimensionality of my heatmaps while preserving 95% of variance ?

Strange thing

array([ 0.05744624,  0.11482341,  0.17167621,  0.22837643,  0.284996  ,
        0.34127299,  0.39716828,  0.45296374,  0.50849681,  0.56382308,
        0.61910508,  0.67425335,  0.72897448,  0.78361028,  0.83813329,
        0.89247688,  0.94636864,  1.        ])

It means, I need to keep 17 components to reduce the dimensionality of my data such that l have 18 dimensions.

What is wrong ?

EDIT : following the suggestions of Eric Yang


Then applying PCA as follow :

pca = PCA(n_components=11)
results the following : 
array([ 0.21121199,  0.33070526,  0.44827572,  0.55748779,  0.64454442,
        0.72588593,  0.7933346 ,  0.85083687,  0.89990991,  0.9306283 ,
        0.9596194 ], dtype=float32)

11 components is needed to explain 95% variance of my data.

(18, 11)

Hence we go from (18,50176) to (18, 11)

Thank you for your help


  • The ability to reduce your variance is a function of your data. If you have an N dimensional gaussian with each dimension N(0,1), each dimension will explain 1/N of your variance, and so your ability to reduce dimensions via PCA would be minimal. So the results of PCA does not seem to be incorrect.

    Now based on a superficial understanding of your problem, you have 18 images that are 224x224 correct? If that is correct, then your dimensionality is 224x224 not 18. So you'd want to ask what is the minimum number of pixels in my image that explain the difference between my 18 images. (However, I could be wrong if that is not the assumption, and what you have is 18 channels for 1 image)

    There is one other possibility in which you have a series of similar images (and so your dimensionality is going to be 18), and you're looking for the Eigen image. If the images are too different, you will have a minimal reduction in the dimensionality.