Search code examples
pythonscikit-learnpca

Principal Component Analysis (PCA) in Python


I have a (26424 x 144) array and I want to perform PCA over it using Python. However, there is no particular place on the web that explains about how to achieve this task (There are some sites which just do PCA according to their own - there is no generalized way of doing so that I can find). Anybody with any sort of help will do great.


Solution

  • You can find a PCA function in the matplotlib module:

    import numpy as np
    from matplotlib.mlab import PCA
    
    data = np.array(np.random.randint(10,size=(10,3)))
    results = PCA(data)
    

    results will store the various parameters of the PCA. It is from the mlab part of matplotlib, which is the compatibility layer with the MATLAB syntax

    EDIT: on the blog nextgenetics I found a wonderful demonstration of how to perform and display a PCA with the matplotlib mlab module, have fun and check that blog!