Search code examples
pythonnumpycurve-fittingleast-squaresdata-fitting

How to use numy linalg lstsq to fit two datasets with same slope but different intercept?


I am trying to do weighted least-square fitting, and came across numpy.linalg.lstsq. I need to fit the weighted least squares. So, the following works:

# Generate some synthetic data from the model.
N = 50
x = np.sort(10 * np.random.rand(N))
yerr = 0.1 + 0.5 * np.random.rand(N)
y = 10.0 * x + 15
y += yerr * np.random.randn(N)
#do the fitting
err = 1/yerr**2
W = np.sqrt(np.diag(err))
x = x.flatten()
y = y.flatten()
A = np.vstack([x, np.ones(len(x))]).T
xw = np.dot(W,A)
yw = np.dot(W,y)
m, b = np.linalg.lstsq(xw, yw)[0]

which gives me the best-fit slope and intercept. Now, suppose I have two datasets with same slope but different intercepts? How would I do a joint fit such that I get best-fit slope plus two intercepts. I still need to have the weighted least square version. For an unweighted case, I found that the following works:

(m,b1,b2),_,_,_ = np.linalg.lstsq(np.stack([np.concatenate((x1,x2)),
                                        np.concatenate([np.ones(len(x1)),np.zeros(len(x2))]),
                                        np.concatenate([np.zeros(len(x1)),np.ones(len(x2))])]).T, 
                              np.concatenate((y1,y2)))

Solution

  • First of all I rewrite your first approach as it can be written clearer in my opinion like this

    weights = 1 / yerr
    m, b = np.linalg.lstsq(np.c_[weights * x, weights], weights * y, rcond=None)[0]
    

    To fit 2 datasets you can stack 2 arrays but make 0 some elements of matrix.

    np.random.seed(12)
    N = 3
    x = np.sort(10 * np.random.rand(N))
    yerr = 0.1 + 0.5 * np.random.rand(N)
    y = 10.0 * x + 15
    y += yerr * np.random.randn(N)
    
    M = 2
    x1 = np.sort(10 * np.random.rand(M))
    yerr1 = 0.1 * 0.5 * np.random.rand(M)
    y1 = 10.0 * x1 + 25
    y1 += yerr1 * np.random.randn(M)
    #do the fitting
    weights = 1 / yerr
    weights1 = 1 / yerr1
    first_column = np.r_[weights * x, weights1 * x1]
    second_column = np.r_[weights, [0] * x1.size]
    third_column = np.r_[[0] * x.size, weights1]
    a = np.c_[first_column, second_column, third_column]
    print(a)
    # [[  4.20211437   2.72576342   0.        ]
    #  [ 24.54293941   9.32075195   0.        ]
    #  [ 13.22997409   1.78771428   0.        ]
    #  [126.37829241   0.          26.03711851]
    #  [686.96961895   0.         124.44253391]]
    c = np.r_[weights * y, weights1 * y1]
    print(c)
    # [  83.66073785  383.70595203  159.12058215 1914.59065915 9981.85549321]
    m, b1, b2 = np.linalg.lstsq(a, c, rcond=None)[0]
    print(m, b1, b2)
    # 10.012202998026055 14.841412336510793 24.941219918240172
    

    EDIT

    If you want different slopes and one intercept you can do it this way. Probably it is better to grasp the general idea on the one slope 2 intercepts case. Take a look to array a: you construct it from weights as well as c so now it is unweighted problem. You try to find such vector = [slope, intercept1, intercept2] that a @ vector = c (as much as possible by minimizing sum of squares of differences). By putting zeros in a we make it separable: upper part of matrix a vary slope and intercept1 and down part of a vary slope and intercept2. Similar to 2 slopes case with vector = [slope1, slope2, intercept].

    first_column = np.r_[weights * x, [0] * x1.size]
    second_column = np.r_[[0] * x.size, weights1 * x1]
    third_column = np.r_[weights, weights1]