Search code examples
pythonscikit-learnpca

What are the axes on PCA scatter plots?


My team is training multiple models to compare their accuracy/precision/recall. We have generated scatter plots using scikit-learn, and the scatter plots look like the following:

enter image description here

We have been doing some research and cannot find what the X and Y axes represent. We've read through the following article which has similar results:

https://scikit-learn.org/stable/auto_examples/neighbors/plot_nca_dim_reduction.html

In our case, we have a high number of dimensions (more than 20). From our research, we've found that the dimensions are condensed into just 2 dimensions, which I assume are these X and Y axes. Is this the case? And if so, what do these represent?


Solution

  • Digging into the code from the scikit-learn tutorials you have linked to, we see:

    # Embed the data set in 2 dimensions using the fitted model
    X_embedded = model.transform(X)
    
    # Plot the projected points and show the evaluation score
    plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=y, s=30, cmap='Set1')
    

    So, the plot is indeed about the first 2 principal components of the transformed data X_embedded[:, 0] and X_embedded[:, 1] (in the X and Y axis respectively).