Image compression using PCA

not sure if this is the place to ask this question.

I have a question about PCA with regards to storage space.

If we were to use PCA to compress images,

We would at least have to store 1) The number of Principal components 2) The numpy array where the mean was extracted

Since the original image array size and the mean subtracted array size is the same. amount of storage required will be the same and hence where is the compression then?

Solution

First: Using PCA to compress images is possible, yet it is not possible (doesnt make any sense) without loss. The basic idea is to minimize the number of dimensions while maximizing the maintained variance.

Assume you have n images of size x*y.

Then you would compute a single mean image of size x * y, which you would have to store. Further, you could use the top k eigenvectors/principal components to reduce dimensionalty. Thereby you would reduce each image (based on your choice of how much variance is to be kept) from x * y dimensions to k dimensions. Finally you would need to store the top k eigenvectors/principal components which is a matrix of size k * (x*y).

To sum up: You could reduce n images of size x * y to

a) n arrays of size k

b) a single mean image of size x * y

c) a matrix of size k * ( x * y) containing the top k principal components

Whether or not this does actually result in a compression depends on your choice of k and on the number of images.

Although theoretically possible, this compression does contain loss.