I am trying to do weighted least-square fitting, and came across numpy.linalg.lstsq
. I need to fit the weighted least squares. So, the following works:
# Generate some synthetic data from the model.
N = 50
x = np.sort(10 * np.random.rand(N))
yerr = 0.1 + 0.5 * np.random.rand(N)
y = 10.0 * x + 15
y += yerr * np.random.randn(N)
#do the fitting
err = 1/yerr**2
W = np.sqrt(np.diag(err))
x = x.flatten()
y = y.flatten()
A = np.vstack([x, np.ones(len(x))]).T
xw = np.dot(W,A)
yw = np.dot(W,y)
m, b = np.linalg.lstsq(xw, yw)[0]
which gives me the best-fit slope and intercept. Now, suppose I have two datasets with same slope but different intercepts? How would I do a joint fit such that I get best-fit slope plus two intercepts. I still need to have the weighted least square version. For an unweighted case, I found that the following works:
(m,b1,b2),_,_,_ = np.linalg.lstsq(np.stack([np.concatenate((x1,x2)),
np.concatenate([np.ones(len(x1)),np.zeros(len(x2))]),
np.concatenate([np.zeros(len(x1)),np.ones(len(x2))])]).T,
np.concatenate((y1,y2)))
First of all I rewrite your first approach as it can be written clearer in my opinion like this
weights = 1 / yerr
m, b = np.linalg.lstsq(np.c_[weights * x, weights], weights * y, rcond=None)[0]
To fit 2 datasets you can stack 2 arrays but make 0 some elements of matrix.
np.random.seed(12)
N = 3
x = np.sort(10 * np.random.rand(N))
yerr = 0.1 + 0.5 * np.random.rand(N)
y = 10.0 * x + 15
y += yerr * np.random.randn(N)
M = 2
x1 = np.sort(10 * np.random.rand(M))
yerr1 = 0.1 * 0.5 * np.random.rand(M)
y1 = 10.0 * x1 + 25
y1 += yerr1 * np.random.randn(M)
#do the fitting
weights = 1 / yerr
weights1 = 1 / yerr1
first_column = np.r_[weights * x, weights1 * x1]
second_column = np.r_[weights, [0] * x1.size]
third_column = np.r_[[0] * x.size, weights1]
a = np.c_[first_column, second_column, third_column]
print(a)
# [[ 4.20211437 2.72576342 0. ]
# [ 24.54293941 9.32075195 0. ]
# [ 13.22997409 1.78771428 0. ]
# [126.37829241 0. 26.03711851]
# [686.96961895 0. 124.44253391]]
c = np.r_[weights * y, weights1 * y1]
print(c)
# [ 83.66073785 383.70595203 159.12058215 1914.59065915 9981.85549321]
m, b1, b2 = np.linalg.lstsq(a, c, rcond=None)[0]
print(m, b1, b2)
# 10.012202998026055 14.841412336510793 24.941219918240172
EDIT
If you want different slopes and one intercept you can do it this way. Probably it is better to grasp the general idea on the one slope 2 intercepts case. Take a look to array a
: you construct it from weights as well as c
so now it is unweighted problem. You try to find such vector = [slope, intercept1, intercept2]
that a @ vector = c
(as much as possible by minimizing sum of squares of differences). By putting zeros in a
we make it separable: upper part of matrix a
vary slope
and intercept1
and down part of a
vary slope
and intercept2
. Similar to 2 slopes case with vector = [slope1, slope2, intercept]
.
first_column = np.r_[weights * x, [0] * x1.size]
second_column = np.r_[[0] * x.size, weights1 * x1]
third_column = np.r_[weights, weights1]