Search code examples
python-3.x3dpca

Generate 3D PCA seperation plane plot


I wanted to generate a 3D plot to display the separation of the two classes. I looked at this solution, but do not know how to implement the separation plane in a px.scatter_3d

Here is the code that I have so far:

import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import pandas as pd
import os
from mpl_toolkits.mplot3d import Axes3D
from sklearn.decomposition import PCA

#df = pd.read_csv('df00_snippet.csv')
#X_train_flat = df.drop(columns=['Label']).values

#ydata = df['Label'].values

#X_train_flat

pca_train = PCA().fit(X_train_flat)

pca_train = PCA(n_components = 4) 
x_pca = pca_train.fit_transform(X_train_flat)

y_train_new = ydata.astype(str)

# https://plotly.com/python/3d-scatter-plots/
fig = px.scatter_3d(x_pca,
            x= x_pca[:,0], y= x_pca[:,1],z = x_pca[:,2], 
            labels={'x':'PCA-1', 'y':'PCA-2','z':'PCA-3'},
            size_max=13,
            #symbol=y_train_new, 
            opacity=1,
            color=y_train_new,
            color_discrete_sequence=["blue", "green"],
            title='3d Plot of Top 3 PCA components')
fig.show()

Here is a snippet of mydata:

feat1   feat2   feat3   feat4   Label
-3.8481877  -0.47685334 0.63422906  1.0396314   1
-2.320888   0.65347993  1.1519914   0.12997247  1
1.5827686   1.4119303   -1.7410104  -4.6962333  1
-0.1337152  0.13315737  -1.6648949  -1.4205348  1
-0.4028037  1.332986    1.3618442   0.3292255   1
-0.015517877    1.346349    1.4083523   0.87017965  1
-0.2669228  0.5478992   -0.06730786 -1.5959451  1
-0.03318152 0.3263167   -2.116833   -5.4616213  1
0.4588691   0.6723614   -1.617398   -4.3511734  1
0.5899199   0.66525555  -1.694493   -3.9452586  1
1.610061    2.4186094   1.8807093   1.3764497   0
1.7985699   2.4387648   1.6306056   1.1184534   0
-9.222036   -9.9776 -9.832  -9.909746   0
0.21364458  -1.0171559  -4.9093766  -6.2154694  0
-0.019955145    -1.1677283  -4.6549516  -5.9503417  0
0.44730473  -0.77167743 -4.7527356  -5.971007   0
-0.16508447 -0.005777468    -1.5020386  -4.49326    0
-0.8654994  -0.54387957 -1.300646   -4.621529   0
-1.7471086  -2.0005553  -1.7533782  -2.6065414  0
-1.5313624  -1.6995796  -1.4394685  -2.600004   0

Can you assist me in generating the separation plane? Thanks!


Solution

  • Took quite a few hours, but here's my attempt at it.

    There are 2 things that need to be done:

    For generating the points on the plane, we use a portion of the code from the 3D Plane in PCA post (utilizing the "ax+by+cz=d") using the 'x_pca' variable of fitted points and the eigenvector's the from the 'pca_train' variable (see note at end of answer). The normal 'a, b, and c' are generated from the 'eig_vec' variable. The x and y coordinates are generated and the 'centroid' and 'd' value is calculate and passed into the "ax+by+cz=d" Which gives us the x, y and z coordinates of the plane.

    As for putting the plane on the Scatter Plot, that is the simplest part. Using the Adding Planes to a 3D Scatter post, we can use the points xx, yy and z to generate the plane. The colour of the plane can be changed by getting a new RGB value and change both ‘#FFDB58' hex values.

    The code:

    import numpy as np
    import matplotlib.pyplot as plt
    import plotly.express as px
    import seaborn as sns
    import pandas as pd
    import plotly.graph_objects as go
    import os
    from mpl_toolkits.mplot3d import Axes3D
    from sklearn.decomposition import PCA
    
    df = pd.read_csv('df00_snippet.csv')
    X_train_flat = df.drop(columns=['Label']).values
    
    ydata = df['Label'].values
    
    pca_train = PCA(n_components = 4).fit(X_train_flat)
    
    x_pca = pca_train.fit_transform(X_train_flat)
    
    y_train_new = ydata.astype(str)
    
    # https://plotly.com/python/3d-scatter-plots/
    fig = px.scatter_3d(x_pca,
                x= x_pca[:,0], y= x_pca[:,1],z = x_pca[:,2], 
                labels={'x':'PCA-1', 'y':'PCA-2','z':'PCA-3'},
                size_max=13,
                #symbol=y_train_new, 
                opacity=1,
                color=y_train_new,
                color_discrete_sequence=["blue", "green"],
                title='3d Plot of Top 3 PCA components')
    
    # -- Start calculating the plane --
    # https://stackoverflow.com/questions/49957601/how-can-i-draw-3d-plane-using-pca-in-python
    
    eig_vec = pca_train.components_
    
    # This is the normal vector of minimum variance
    normal = eig_vec[2, :]  # (a, b, c)
    centroid = np.mean(x_pca, axis=0)
    
    # Every point (x, y, z) on the plane should satisfy a*x+b*y+c*z = d
    
    # Taking centroid as a point on the plane
    d = -centroid.dot(normal)
    
    # Calculate the plane's x, y and z coordinates
    xx, yy = np.meshgrid((np.min(x_pca[:, 0]), np.max(x_pca[:, 0])), (np.min(x_pca[:, 1]), np.max(x_pca[:, 1])))
    # Generated from the a*x+b*y+c*z = d formula
    z = (-normal[0] * xx - normal[1] * yy - d) * 1. / normal[2]
    
    # Add a plane to the figure
    # https://stats.stackexchange.com/questions/163356/fitting-a-plane-to-a-set-of-points-in-3d-using-pca
    fig.add_trace(go.Surface(x=xx, y=yy, z=z, colorscale=[[0, '#00FFFF'], [1, '#00FFFF']],  showscale=False))
    fig.show()
    

    Note: After running this the 'x' and 'y' axis seems to be ok, but the 'z' axis seems to be off. Which I think has something to do with this line:

    eig_vec = pca_train.components_