Search code examples
pythonnumpydata-fitting

How to weigh the points in a scatter plot for a fit?


So, I looked up information about the weights parameter in the polyfit (numpy.polynomial.polynomial.polyfit) function in Python and it seems like it has something to do with the error associated with the individual points. (How to include measurement errors in numpy.polyfit)

However, what I am trying to do has nothing to do with the error, but weights. I have an image in the form of a numpy array which indicates the amount of charge deposited in the detector. I convert that image to a scatter plot and then do a fit. But I want that fit to give more weight to the points which have more charge deposited and less to the ones that have less charge. Is that what the weights parameter is for?

Here's an example image:image of the shower Here's my code:

def get_best_fit(image_array, fixedX, fixedY):
    weights = np.array(image_array)
    x = np.where(weights>0)[1]
    y = np.where(weights>0)[0]
    size = len(image_array) * len(image_array[0])
    y = np.zeros((len(image_array), len(image_array[0])))
    for i in range(len(np.where(weights>0)[0])):
        y[np.where(weights>0)[0][i]][np.where(weights>0)[1][i]] = np.where(weights>0)[0][i]
    y = y.reshape(size)
    x = np.array(range(len(image_array)) * len(image_array[0]))
    weights = weights.reshape((size))
    b, m = polyfit(x, y, 1, w=weights)
    angle = math.atan(m) * 180/math.pi
    return b, m, angle

Let me explain to you the code:

The first line assigns the charged deposited in a variable called weights. The next two lines get the points where the charge deposited is >0, so there's some charge deposited to capture the coordinates for the scatter plot. Then I get the size of the entire image to later convert to just a one dimensional array for plotting. I then go through the image and try to get the coordinates of the points where there's some charge deposited (remember that the amount of charge is stored in the variable weights). I then reshape the y coordinates to get a one dimensional array and get the x coordinates for all the corresponding y coordinates from the image, then change the shape of the weights too to be just one dimensional.

Edit: if there's a way of doing this using the np.linalg.lstsq function, that would be ideal since I'm also trying to get the fit to go through the vertex of the plot. I could just reposition the plot so the vertex is at zero and then use np.linalg.lstsq, but that wouldn't allow me to use the weights.


Solution

  • You can use sklearn.linear_model.LinearRegression. It allows you to not fit the intercept (i.e. line goes through the origin, or, with some finagling, the point of your choice). It also deals with weighted data.

    e.g. (mostly stolen shamelessly from @Hiho's answer)

    import numpy as np
    import matplotlib.pyplot as plt
    import sklearn.linear_model
    
    y = np.array([1.0, 3.3, 2.2, 4.25, 4.8, 5.1, 6.3, 7.5])
    x = np.arange(y.shape[0]).reshape((-1,1))
    w = np.linspace(1,5,y.shape[0])
    
    model = sklearn.linear_model.LinearRegression(fit_intercept=False)
    model.fit(x, y, sample_weight=w)
    
    line_x = np.linspace(min(x), max(x), 100).reshape((-1,1))
    pred = model.predict(line_x)
    
    plt.scatter(x, y)
    plt.plot(line_x, pred)
    
    plt.show()