I wanted to generate a 3D plot to display the separation of the two classes. I looked at this solution, but do not know how to implement the separation plane in a px.scatter_3d
Here is the code that I have so far:
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import pandas as pd
import os
from mpl_toolkits.mplot3d import Axes3D
from sklearn.decomposition import PCA
#df = pd.read_csv('df00_snippet.csv')
#X_train_flat = df.drop(columns=['Label']).values
#ydata = df['Label'].values
pca_train = PCA().fit(X_train_flat)
pca_train = PCA(n_components = 4)
x_pca = pca_train.fit_transform(X_train_flat)
y_train_new = ydata.astype(str)
# https://plotly.com/python/3d-scatter-plots/
fig = px.scatter_3d(x_pca,
x= x_pca[:,0], y= x_pca[:,1],z = x_pca[:,2],
labels={'x':'PCA-1', 'y':'PCA-2','z':'PCA-3'},
color_discrete_sequence=["blue", "green"],
title='3d Plot of Top 3 PCA components')
Here is a snippet of mydata:
feat1 feat2 feat3 feat4 Label
-3.8481877 -0.47685334 0.63422906 1.0396314 1
-2.320888 0.65347993 1.1519914 0.12997247 1
1.5827686 1.4119303 -1.7410104 -4.6962333 1
-0.1337152 0.13315737 -1.6648949 -1.4205348 1
-0.4028037 1.332986 1.3618442 0.3292255 1
-0.015517877 1.346349 1.4083523 0.87017965 1
-0.2669228 0.5478992 -0.06730786 -1.5959451 1
-0.03318152 0.3263167 -2.116833 -5.4616213 1
0.4588691 0.6723614 -1.617398 -4.3511734 1
0.5899199 0.66525555 -1.694493 -3.9452586 1
1.610061 2.4186094 1.8807093 1.3764497 0
1.7985699 2.4387648 1.6306056 1.1184534 0
-9.222036 -9.9776 -9.832 -9.909746 0
0.21364458 -1.0171559 -4.9093766 -6.2154694 0
-0.019955145 -1.1677283 -4.6549516 -5.9503417 0
0.44730473 -0.77167743 -4.7527356 -5.971007 0
-0.16508447 -0.005777468 -1.5020386 -4.49326 0
-0.8654994 -0.54387957 -1.300646 -4.621529 0
-1.7471086 -2.0005553 -1.7533782 -2.6065414 0
-1.5313624 -1.6995796 -1.4394685 -2.600004 0
Can you assist me in generating the separation plane? Thanks!
Took quite a few hours, but here's my attempt at it.
There are 2 things that need to be done:
For generating the points on the plane, we use a portion of the code from the 3D Plane in PCA post (utilizing the "ax+by+cz=d") using the 'x_pca' variable of fitted points and the eigenvector's the from the 'pca_train' variable (see note at end of answer). The normal 'a, b, and c' are generated from the 'eig_vec' variable. The x and y coordinates are generated and the 'centroid' and 'd' value is calculate and passed into the "ax+by+cz=d" Which gives us the x, y and z coordinates of the plane.
As for putting the plane on the Scatter Plot, that is the simplest part. Using the Adding Planes to a 3D Scatter post, we can use the points xx, yy and z to generate the plane. The colour of the plane can be changed by getting a new RGB value and change both ‘#FFDB58' hex values.
The code:
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import pandas as pd
import plotly.graph_objects as go
import os
from mpl_toolkits.mplot3d import Axes3D
from sklearn.decomposition import PCA
df = pd.read_csv('df00_snippet.csv')
X_train_flat = df.drop(columns=['Label']).values
ydata = df['Label'].values
pca_train = PCA(n_components = 4).fit(X_train_flat)
x_pca = pca_train.fit_transform(X_train_flat)
y_train_new = ydata.astype(str)
# https://plotly.com/python/3d-scatter-plots/
fig = px.scatter_3d(x_pca,
x= x_pca[:,0], y= x_pca[:,1],z = x_pca[:,2],
labels={'x':'PCA-1', 'y':'PCA-2','z':'PCA-3'},
color_discrete_sequence=["blue", "green"],
title='3d Plot of Top 3 PCA components')
# -- Start calculating the plane --
# https://stackoverflow.com/questions/49957601/how-can-i-draw-3d-plane-using-pca-in-python
eig_vec = pca_train.components_
# This is the normal vector of minimum variance
normal = eig_vec[2, :] # (a, b, c)
centroid = np.mean(x_pca, axis=0)
# Every point (x, y, z) on the plane should satisfy a*x+b*y+c*z = d
# Taking centroid as a point on the plane
d = -centroid.dot(normal)
# Calculate the plane's x, y and z coordinates
xx, yy = np.meshgrid((np.min(x_pca[:, 0]), np.max(x_pca[:, 0])), (np.min(x_pca[:, 1]), np.max(x_pca[:, 1])))
# Generated from the a*x+b*y+c*z = d formula
z = (-normal[0] * xx - normal[1] * yy - d) * 1. / normal[2]
# Add a plane to the figure
# https://stats.stackexchange.com/questions/163356/fitting-a-plane-to-a-set-of-points-in-3d-using-pca
fig.add_trace(go.Surface(x=xx, y=yy, z=z, colorscale=[[0, '#00FFFF'], [1, '#00FFFF']], showscale=False))
Note: After running this the 'x' and 'y' axis seems to be ok, but the 'z' axis seems to be off. Which I think has something to do with this line:
eig_vec = pca_train.components_