Search code examples
pythoncurve-fittingdata-fitting

How to do a polynomial fit with fixed points in 3D


I have sets of x,y,z points in 3D space and another variable called charge which represents the amount of charge that was deposited in a specific x,y,z coordinate. I would like to do a weighted (weighted by the amount of charge deposited in the detector, which just corresponds to a higher weight for more charge) for this data such that it passes through a given point, the vertex.

Now, when I did this for 2D, I tried all sorts of methods (bringing the vertex to the origin and doing the same transformation for all the other points and forcing the fit to go through the origin, giving the vertex really high weight) but none of them were as good as the answer given here by Jaime: How to do a polynomial fit with fixed points

It uses the method of Lagrange multipliers, which I'm vaguely familiar with from an undergraduate Advanced Multi variable course, but not much else and it doesn't seem like the transformation of that code will be as easy as just adding a z coordinate. (Note that even though the code doesn't take into consideration the amount of charge deposited, it still gave me the best results). I was wondering if there was a version of the same algorithm out there, but in 3D. I also contacted the author of the answer in Gmail, but didn't hear back from him.

Here is some more information about my data and what I'm trying to do in 2D: How to weigh the points in a scatter plot for a fit?

Here is my code for doing this in a way where I force the vertex to be at the origin and then fit the data setting fit_intercept=False. I'm currently pursuing this method for 2D data since I'm not sure if there's a 3D version out there for Lagrange multipliers, but there are linear regression ways of doing this in 3D, for instance, here: Fitting a line in 3D:

import numpy as np
import sklearn.linear_model

def plot_best_fit(image_array, vertexX, vertexY):
    weights = np.array(image_array)
    x = np.where(weights>0)[1]
    y = np.where(weights>0)[0]
    size = len(image_array) * len(image_array[0])
    y = np.zeros((len(image_array), len(image_array[0])))
    for i in range(len(np.where(weights>0)[0])):
        y[np.where(weights>0)[0][i]][np.where(weights>0)[1][i]] = np.where(weights>0)[0][i]
    y = y.reshape(size)
    x = np.array(range(len(image_array)) * len(image_array[0]))
    weights = weights.reshape((size))
    for i in range(len(x)):
        x[i] -= vertexX
        y[i] -= vertexY
    model = sklearn.linear_model.LinearRegression(fit_intercept=False)
    model.fit(x.reshape((-1, 1)),y,sample_weight=weights)
    line_x = np.linspace(0, 512, 100).reshape((-1,1))
    pred = model.predict(line_x)
    m, b = np.polyfit(np.linspace(0, 512, 100), np.array(pred), 1)
    angle = math.atan(m) * 180/math.pi
    return line_x, pred, angle, b, m

image_array is a numpy array and vertexX and vertexY are the x and y coordinates of the vertex, respectively. Here's my data: https://uploadfiles.io/bbhxo. I cannot create a toy data as there is not a simple way of replicating this data, it was produced by Geant4 simulation of a neutrino interacting with an argon nucleus. I don't want to get rid of the complexity of the data. And this specific event happens to be the one for which my code does not work, I'm not sure if I can generate a data specifically so my code doesn't work on it.


Solution

  • This is more a hand made solution using basic optimization. It is straight forward. One just measures the distance of a point to the line to be fitted and minimizes the weighted distances using basic optimize.leastsq. Code is as follows:

    # -*- coding: utf-8 -*
    from matplotlib import pyplot as plt
    from mpl_toolkits.mplot3d import Axes3D
    import matplotlib.cm as cm
    from scipy import optimize
    import numpy as np
    
    def rnd( a ):
        return  a * ( 1 - 2 * np.random.random() ) 
    
    def affine_line( s, theta, phi, x0, y0, z0 ):
        a = np.sin( theta) * np.cos( phi )
        b = np.sin( theta) * np.sin( phi )
        c = np.cos( theta )
        return np.array( [ s * a + x0, s * b + y0, s * c + z0 ] )
    
    def point_to_line_distance( x , y, z , theta, phi, x0, y0, z0 ):
        xx = x - x0
        yy = y - y0
        zz = z - z0
        a = np.sin( theta) * np.cos( phi )
        b = np.sin( theta) * np.sin( phi )
        c = np.cos( theta )
        r = np.array( [ xx, yy, zz ] )
        t = np.array( [ a, b, c ] )
        return np.linalg.norm( r - np.dot( r, t) * t )
    
    def residuals( parameters, fixpoint, data, weights=None ):
        theta, phi = parameters
        x0, y0, z0 = fixpoint
        if weights is None:
            w = np.ones( len( data ) )
        else:
            w = np.array( weights )
        diff = np.array( [ point_to_line_distance( x , y, z , theta, phi , *fixpoint ) for x, y, z in data ] )
        diff = diff * w
        return diff
    
    ### some test data
    fixpoint = [ 1, 2 , -.3 ]
    trueline = np.array( [ affine_line( s, .7, 1.7, *fixpoint ) for s in np.linspace( -1, 2, 50 ) ] )
    rndData = np.array( [ np.array( [ a + rnd( .6), b + rnd( .35 ), c + rnd( .45 ) ] ) for a, b, c in trueline ] )
    zData = [ 20 * point_to_line_distance( x , y, z , .7, 1.7, *fixpoint ) for x, y, z in rndData ]
    
    ### unweighted
    bestFitValuesUW, ier= optimize.leastsq(residuals, [ 0, 0],args=( fixpoint, rndData ) )
    print bestFitValuesUW
    uwLine = np.array( [ affine_line( s, bestFitValuesUW[0], bestFitValuesUW[1], *fixpoint ) for s in np.linspace( -2, 2, 50 ) ] )
    
    ### weighted ( chose inverse distance as weight....would be charge in OP's case )
    bestFitValuesW, ier= optimize.leastsq(residuals, [ 0, 0],args=( fixpoint, rndData, [ 1./s for s in zData ] ) )
    print bestFitValuesW
    wLine = np.array( [ affine_line( s, bestFitValuesW[0], bestFitValuesW[1], *fixpoint ) for s in np.linspace( -2, 2, 50 ) ] )
    
    ### plotting
    fig = plt.figure()
    ax = fig.add_subplot( 1, 1, 1, projection='3d' )
    ax.plot( *np.transpose(trueline ) ) 
    ax.scatter( *fixpoint, color='k' )
    ax.scatter( rndData[::,0], rndData[::,1], rndData[::,2] , c=zData, cmap=cm.jet )
    
    ax.plot( *np.transpose( uwLine ) ) 
    ax.plot( *np.transpose( wLine ) ) 
    
    ax.set_xlim( [ 0, 2.5 ] )
    ax.set_ylim( [ 1, 3.5 ] )
    ax.set_zlim( [ -1.25, 1.25 ] )
    
    plt.show()
    

    which returns

    >> [-0.68236386 -1.3057938 ]
    >> [-0.70928735 -1.4617517 ]
    

    results

    The fix point is shown in black. The original line in blue. The unweighted and weighted fit are in orange and green, respectively. Data is colored according to distance to line.