Search code examples
pythonleast-squares

Weighted Least Squares in Python


I am trying to do a weighted least squares fit of my data. The values when x=0 and when x>0.2 should be weighted heavily. The values when 0<x<=0.2 should be weighted to a lesser extent but I still want them to have some weighting.

My data is:

filtered_x = array([0.        , 0.03807991, 0.05077321, 0.06346652, 0.07615982,
       0.08885313, 0.10154643, 0.11423973, 0.12693304, 0.13962634,
       0.15231964, 0.16501295, 0.17770625, 0.19039955, 0.20309286,
       0.21578616, 0.22847947, 0.24117277, 0.25386607, 0.26655938,
       0.27925268, 0.29194598, 0.30463929, 0.31733259, 0.33002589,
       0.3427192 ])

filtered_y = array([0.        , 1.53989397, 2.04460628, 4.18043213, 2.97621482,
       2.82642339, 2.98335023, 2.98964836, 2.12218901, 1.42801972,
       1.25930683, 0.71644077, 0.48220866, 0.21165985, 0.24756609,
       0.21123179, 0.57344999, 0.49362762, 0.20282767, 0.50321186,
       0.50347165, 0.74259408, 0.48493783, 0.81785588, 0.54543666,
       0.53218838])

I have the following code already:

# fitted function
def fpow(x, a, b, c):
    return a*x**b+c

# custom weight function
def custom_weights(x):
    weights = np.ones_like(x)  # Default weight for all points
    weights[x == 0] = 0.1     # Low Error for x = 0
    weights[x > 0.2] = 0.1   # Low Error for x > 0.15
    weights[(0 < x) & (x <= 0.2)] = 0.5 # Medium error for 0 < x <= 0.2
    return weights
# Weight 0 and higher wavenumbers heavier

# initial guess
pars0 = (0.4, 0.4, 1)

# perform fitting with custom weights
popt, pcov = curve_fit(fpow, filtered_x, filtered_y, absolute_sigma=True, p0=pars0, sigma=custom_weights(filtered_x), maxfev =5000)

# parameters
a_opt = popt[0]
b_opt = popt[1]
c_opt = popt[2]

# plot data
plt.errorbar(filtered_x, filtered_y, yerr=0, fmt =".", color='black', label ='data', zorder=1, markersize=5)

# creating x interval to include in y fit
x_interval = np.linspace(0, max(filtered_x), 1000)
y_fit = fpow( x_interval , *popt)
plt.plot( x_interval, y_fit, color = "red", label = "Weighted fit", zorder=2 , linewidth=3)

plt.grid(True)
plt.ylabel("U [m/s]")
plt.xlabel("Wavenumber [rad/m]")
plt.title("LS Weighted Fit of Current")
plt.legend()
plt.show()

I get the following enter image description here

It also gives me a runtime error

<ipython-input-747-a3ac7b8135e0>:2: RuntimeWarning: divide by zero encountered in power
  return a*x**b+c

You can see that something has gone wrong, the fit is not good for the larger x values. I have tried to remove the (0,0) point in the data however, I get an even worse result which is below: enter image description here

Edit: I'm looking for something that looks somewhat more like the following: enter image description here


Solution

  • There is a question about this kind of RuntimeError on here, and the answer suggests using a very small value instead of zero for your data.

    filtered_x = array([1e-8, 0.03807991, 0.05077321, 0.06346652, 0.07615982,
                        0.08885313, 0.10154643, 0.11423973, 0.12693304, 0.13962634,
                        0.15231964, 0.16501295, 0.17770625, 0.19039955, 0.20309286,
                        0.21578616, 0.22847947, 0.24117277, 0.25386607, 0.26655938,
                        0.27925268, 0.29194598, 0.30463929, 0.31733259, 0.33002589,
                        0.3427192])
    
    filtered_y = array([1e-8, 1.53989397, 2.04460628, 4.18043213, 2.97621482,
                        2.82642339, 2.98335023, 2.98964836, 2.12218901, 1.42801972,
                        1.25930683, 0.71644077, 0.48220866, 0.21165985, 0.24756609,
                        0.21123179, 0.57344999, 0.49362762, 0.20282767, 0.50321186,
                        0.50347165, 0.74259408, 0.48493783, 0.81785588, 0.54543666,
                        0.53218838])
    

    Notice the 1e-8 in the first position. This gets rid of the warning. It also leads to a somewhat improved fit.

    New fit, after changing data

    Check out the other question though, because it suggests more methods to improve the fit.

    EDIT:

    I forgot to mention, that I changed your custom weight function, so it uses np.isclose. Checking for equality with floating point numbers is error-prone and usually not what you want. np.isclose checks, whether numbers are in a given epsilon environment of the value to compare against. There's a similar function math.isclose in the stdlib.

    # custom weight function
    def custom_weights(x):
        weights = np.ones_like(x)  # Default weight for all points
        weights[np.isclose(x, 0, atol=1e-7)] = 0.1  # Low Error for x = 0
        weights[x > 0.2] = 0.1  # Low Error for x > 0.15
        weights[(0 < x) & (x <= 0.2)] = 0.5  # Medium error for 0 < x <= 0.2
        return weights
    

    EDIT 2:

    I've played around with the weights a bit, to get something like the plot you shared. But I'm no domain expert, so this might be garbage. You will need to validate this on your own.

    # custom weight function
    def custom_weights(x):
        weights = np.ones_like(x)  # default weight for all points
        weights[(0 < x) & (x <= 0.2)] = 1  # medium error for 0 < x <= 0.2
        weights[x > 0.2] = 0.1  # low error for x > 0.15
        weights[np.isclose(x, 0, atol=1e-8)] = 0.01  # very low error for x = 0
        return weights
    

    Using these weights, I get the following plot:

    Plot produced by amended weights

    Also, there was another bug in the custom_weights function.

    The weights for values close to zero were reset by the (0 < x) & (x <= 0.2) bit in the end. I've changed the order of operations, so this doesn't happen anymore. A more robust approach would be to use np.isclose here too, since the order of execution shouldn't be something you have to think about here, but TBH I can't be arsed to do that.