Search code examples
pythonlinear-regressionpcamatplotlib-3d

Add regression line and ellipse to a 3D scatter plot


I have a 3D scatter plot that displays a dataframe named data. It tipicaly generates a shape that could be fit with a single line or ellipse.

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import pandas as pd

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

ax.scatter(data['x'], data['y'], data['z'], c=data['c'])

plt.show()

Typical example (sorry I cannot share my data...):

3D scatter plot

So, now I would like to compute a multivariate regression that fits this cloud of dots. There are a lot of articles explaining how to fit this with a plane, but I would like to fit it with a line.

As a bonus, I would also like to fit these dots with an ellipse. Thus, it would reflect the standard deviation and would be much more visual.


Solution

  • I found the answer to the first question which is to find a line best fitting the points cloud. I adapted this post in Python

    from mpl_toolkits.mplot3d import Axes3D
    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')
    
    data = pd.DataFrame([[-1, 15, 2], [2, 6, 8], [5, 4, 20], [1, 5, 20], [3, 9, 12]],
                        columns=['x', 'y', 'z'])
    ax.scatter(data['x'], data['y'], data['z'], c='blue')
    
    # Linear regression
    X = data[['x', 'y', 'z']].values
    Xlen = X.shape[0]
    avgPointCloud = 1 / Xlen * np.array([np.sum(X[:, 0]), np.sum(X[:, 1]), np.sum(X[:, 2])])
    Xmean = X - avgPointCloud
    
    cov = 1 / Xlen * X.T.dot(Xmean)
    
    t = np.arange(-5, 5, 1)
    linearReg = avgPointCloud + cov[:, 0] * np.vstack(t)
    
    ax.plot(linearReg[:, 0], linearReg[:, 1], linearReg[:, 2], 'r', label='Linear Regression')
    ax.legend()
    
    plt.show()
    

    enter image description here