I have a 3D scatter plot that displays a dataframe named data
.
It tipicaly generates a shape that could be fit with a single line or ellipse.
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import pandas as pd
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(data['x'], data['y'], data['z'], c=data['c'])
plt.show()
Typical example (sorry I cannot share my data...):
So, now I would like to compute a multivariate regression that fits this cloud of dots. There are a lot of articles explaining how to fit this with a plane, but I would like to fit it with a line.
As a bonus, I would also like to fit these dots with an ellipse. Thus, it would reflect the standard deviation and would be much more visual.
I found the answer to the first question which is to find a line best fitting the points cloud. I adapted this post in Python
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
data = pd.DataFrame([[-1, 15, 2], [2, 6, 8], [5, 4, 20], [1, 5, 20], [3, 9, 12]],
columns=['x', 'y', 'z'])
ax.scatter(data['x'], data['y'], data['z'], c='blue')
# Linear regression
X = data[['x', 'y', 'z']].values
Xlen = X.shape[0]
avgPointCloud = 1 / Xlen * np.array([np.sum(X[:, 0]), np.sum(X[:, 1]), np.sum(X[:, 2])])
Xmean = X - avgPointCloud
cov = 1 / Xlen * X.T.dot(Xmean)
t = np.arange(-5, 5, 1)
linearReg = avgPointCloud + cov[:, 0] * np.vstack(t)
ax.plot(linearReg[:, 0], linearReg[:, 1], linearReg[:, 2], 'r', label='Linear Regression')
ax.legend()
plt.show()