Search code examples
pythonnumpyscipyapproximation

Scipy.optimization linear function approximation


I've looked at for function approximation methods in scipy.optimize and after reading description for functions figured out(maybe wrong) that they approximate non-linear functions only.

For instance if I've sample output after zip() function for x and y

[(1,1),(4,2),(6,4),(8,6),(10,11)]

As you can see, non-linear function approximates much better but I need linear for my purposes.

I admit the possibility that missed something in documentation of functions presented, so my apologize if question can be answered in the "read docs" way.


Solution

  • In addition to np.polyfit and scipy.stats.linregress as suggested by @user2589273, a low-level way to do linear regression is to solve for the matrix of coefficients using np.linalg.lstsq. Although this approach is a bit more work than using one of the pre-packaged functions for doing linear regression, it's very useful to understand how this works at a basic level, in particular when you start dealing with multivariate data.

    For example:

    import numpy as np
    
    # a simple linear relationship: y = mx + c with m=0.5 and c=2
    x = np.arange(50)
    y = x * 0.5 + 2
    y += np.random.randn(50) * 5    # add some noise
    
    # we can rewrite the line equation as y = Ap, where A=[[x, 1]] and p=[[m], [c]]
    A = np.c_[x, np.ones(50)]
    
    # solving for p gives us the slope and intercept
    p, residuals, rank, svals = np.linalg.lstsq(A, y)
    

    Plotting the fit:

    from matplotlib import pyplot as plt
    
    fig, ax = plt.subplots(1, 1)
    ax.hold(True)
    ax.plot(x, y, 'ob', label='data')
    ax.plot(x, A.dot(p), '-k', lw=2, label='linear fit')
    ax.legend()
    

    enter image description here