Search code examples
pythonscikit-learnpca

Using PCA to dimensionality reduction. Why do not appear all the digits in the graph?


I have used the Digits dataset from Sklearn and I have tried to reduce the dimension from 64 to 2:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#%matplotib inline
from sklearn.decomposition import PCA
from sklearn.datasets import load_digits

digits = load_digits()
digits_df = pd.DataFrame(digits.data,)
digits_df["target"] = pd.Series(digits.target)

pca = PCA(n_components=2)

digits_pca = pca.fit_transform(digits_df.iloc[:,:64])
digits_df_pca = pd.DataFrame(digits_pca,
                            columns =["Component1","Component2"])

finalDf = pd.concat([digits_df_pca, digits_df["target"]], axis = 1)

plt.figure(figsize=(10,10))
sns.scatterplot(data=finalDf,x="Component1", y = "Component2",hue="target",
               )

The graph:

enter image description here

The only digits in the graph are 0,3,6,9. Why can not I see the other five digits?


Solution

  • Check-in your data if all the labels are available using a set.

    If yes, then you can try

    sns.scatterplot(data=finalDf,x="Component1", y = "Component2",hue="target",
                   legend = 'full')
    

    Working code:

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    #%matplotib inline
    from sklearn.decomposition import PCA
    from sklearn.datasets import load_digits
    
    digits = load_digits()
    digits_df = pd.DataFrame(digits.data,)
    digits_df["target"] = pd.Series(digits.target)
    
    pca = PCA(n_components=2)
    
    digits_pca = pca.fit_transform(digits_df.iloc[:,:64])
    digits_df_pca = pd.DataFrame(digits_pca,
                                columns =["Component1","Component2"])
    
    finalDf = pd.concat([digits_df_pca, digits_df["target"]], axis = 1)
    
    plt.figure(figsize=(10,10))
    palette = sns.color_palette("bright", 10)
    sns.scatterplot(data=finalDf,x="Component1", y = "Component2",hue="target",
                   legend = 'full', palette = palette)
    

    enter image description here