I've looked at for function approximation methods in scipy.optimize
and after reading description for functions figured out(maybe wrong) that they approximate non-linear functions only.
For instance if I've sample output after zip()
function for x
and y
[(1,1),(4,2),(6,4),(8,6),(10,11)]
As you can see, non-linear function approximates much better but I need linear for my purposes.
I admit the possibility that missed something in documentation of functions presented, so my apologize if question can be answered in the "read docs" way.
In addition to np.polyfit
and scipy.stats.linregress
as suggested by @user2589273, a low-level way to do linear regression is to solve for the matrix of coefficients using np.linalg.lstsq
. Although this approach is a bit more work than using one of the pre-packaged functions for doing linear regression, it's very useful to understand how this works at a basic level, in particular when you start dealing with multivariate data.
For example:
import numpy as np
# a simple linear relationship: y = mx + c with m=0.5 and c=2
x = np.arange(50)
y = x * 0.5 + 2
y += np.random.randn(50) * 5 # add some noise
# we can rewrite the line equation as y = Ap, where A=[[x, 1]] and p=[[m], [c]]
A = np.c_[x, np.ones(50)]
# solving for p gives us the slope and intercept
p, residuals, rank, svals = np.linalg.lstsq(A, y)
Plotting the fit:
from matplotlib import pyplot as plt
fig, ax = plt.subplots(1, 1)
ax.hold(True)
ax.plot(x, y, 'ob', label='data')
ax.plot(x, A.dot(p), '-k', lw=2, label='linear fit')
ax.legend()