Search code examples
pythonnumpygeometrycomputational-geometryellipse

Fitting 3D ellipsoid to points in 3D space - different methods, different answers


I have a large 3D point cloud that I'm trying to define the shape, size and volume of. I'm using scipy.spatial.convexHull to find the convex hull of the point cloud which gives me the volume of the point cloud and I'm then using the vertices of the convex hull to fit an ellipsoid to that to define the size and shape. I identified two possible python modules for fitting of the ellipsoid : pyEllipsoid_Fit (https://github.com/marksemple/pyEllipsoid_Fit) and ellipsoid_fit_python (https://github.com/aleksandrbazhin/ellipsoid_fit_python) which both seem to derived from the same MATLAB library by Yury Petrov. This all sounds pretty simple.

I've shown a reduced set of points from my convex hull to give an idea of its shape but it's about 30,000 points in the cloud and the convex hull has about 2,000 vertices.

clusterData = np.array([[ 0.73938582, -0.64072618, -0.30248798],
       [ 0.7456499 , -0.62475833, -0.26983183],
       [ 0.71082352, -0.64138446, -0.26690041],
       [ 0.70722122, -0.61039429, -0.25441756],
       [ 0.73099042, -0.63183361, -0.23934509],
       [ 0.7594304 , -0.60428986, -0.29554205],
       [ 0.71364348, -0.66918549, -0.24510986]])

Running the ellipsoid fit on the hull points gives the following results:

center, evecs, radii, v = ellipsoid_fit(hull.points[hull.vertices])

center
Out[51]: array([ 0.73215066, -0.63262089, -0.26906655])

radii
Out[52]: array([0.03521219, 0.06578687, 0.0504479 ])

Both fit modules return the same result for centre point and radii. The centre point values seem plausible and are roughly in the middle of the point cloud and the radii are also plausible although the y value seems possibly a little large.

I also came across a different method using a least squares regression using numpy that looked simple and efficient at https://jekel.me/2020/Least-Squares-Ellipsoid-Fit/. I've implemented this method for comparison:

    def ellipsoid_fit_LS(pos):
        
        # centre coordinates on origin
        pos = pos - np.mean(pos, axis=0)
        
        # build our regression matrix
        A = pos**2
        
        # vector of ones
        O = np.ones(len(A))
        
        # least squares solver
        B, resids, rank, s = np.linalg.lstsq(A, O)
        
        # solving for a, b, c
        a_ls = np.sqrt(1.0/B[0])
        b_ls = np.sqrt(1.0/B[1])
        c_ls = np.sqrt(1.0/B[2])
        
        return (a_ls, b_ls, c_ls)

running this method on my data points returns a different answer for the 3 radii:

ellipsoid_fit_LS(hull.points[hull.vertices])
Out[55]: (0.05628746742089332, 0.07109498037367977, 0.04941867027310397)

The differences are not huge, but I'm wondering why there is a noticeable difference in the answers, particularly in the a/x coefficient. Is one method more correct than the other?

Is the simple numpy one different because it only includes the square terms for a general ellipsoid and not the other coefficient terms, which I think are for other shapes?

I'm curious as the simple numpy one is about 3 times faster on my computer for this small dataset. I'm going to be doing this 1000's of times on much larger datasets than the 2000 point hull used in this example so I don't want to be picking the faster once if it's significantly less accurate.

Maybe someone well versed in computational geometry could comment!


Solution

  • In comparing these two approaches, both use a least squares approach but ellipsoid_fit_LS is assuming that the major axes of the ellipse align with (x, y, z) whereas the matlab knock-offs don't make this assumption. That is, they are more general and will produce better fits for ellipsoids that are tilted relative to the coordinate axes but therefore have more parameters to fit.

    Good but different fits to data are always like this, with similar but not identical results. In general, a good start is to choose a fitting algorithm that has the fewest free parameters as possible but is consistent with the data and your model.