Search code examples
pythonnumpymatplotlibleast-squares

Least squares not working for a set of y's


I am trying to run a least square algorithm using numpy and is having trouble. Can someone please tell me what I am doing wrong in the given code? When I set y to be y = np.power(X, 1) + np.random.rand(20)*3 or some other reasonable function of x, everything is working fine. But for that particular y defined by those given y values, the plot I am getting is senseless.

Is this some kind of numerical problem?

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

X = np.arange(1,21)
y = np.array([-0.00454712, -0.00457764, -0.0045166 , -0.00442505, -0.00427246,
       -0.00411987, -0.00378418, -0.003479  , -0.00314331, -0.00259399,
       -0.00213623, -0.00146484, -0.00082397, -0.00030518,  0.00027466,
        0.00076294,  0.00146484,  0.00192261,  0.00247192,  0.00314331])

#y = np.power(X, 1) + np.random.rand(20)*3

w = np.linalg.lstsq(X.reshape(20, 1), y)[0]

plt.plot(X, y, 'red')
plt.plot(X, X*w[0], 'blue')
plt.show()

Solution

  • Are you sure there is a linear relationship between what you are fitting and the y variable data?

    Using the code (y = np.power(X, 1) + np.random.rand(20)*3) from your example, you have a linear relationship built into the y variable itself (with some noise) which allows your plot to track relatively well with the linear equation.

    X = np.arange(1,21)
    
    #y = np.power(X, 1) + np.random.rand(20)*3
    
    w = np.linalg.lstsq(X.reshape(20, 1), y)[0]
    
    plt.plot(X, y, 'red')
    plt.plot(X, X*w[0], 'blue')
    plt.show()
    

    Plot

    However, when you alternate to something like your y variable

        y = np.array([-0.00454712, -0.00457764, -0.0045166 , -0.00442505, -0.00427246,
           -0.00411987, -0.00378418, -0.003479  , -0.00314331, -0.00259399,
           -0.00213623, -0.00146484, -0.00082397, -0.00030518,  0.00027466,
            0.00076294,  0.00146484,  0.00192261,  0.00247192,  0.00314331])    
    

    You end up with something less easy to fit.

    Plot2

    Looking at the documentation, if you are attempting to something that fits this set of values, you will need to build in a constant component in which case lstsq does not do by default. The docs state for lstsq

    Return the least-squares solution to a linear matrix equation.

    Solves the equation a x = b

    If you really want to fit the data to a linear equation, running code like the below will give you something that almost matches your original data. However, the data behind this process seems to have polynomial/exponential driver which would make polyfit better.

    X = np.arange(1,21)
    y = np.array([-0.00454712, -0.00457764, -0.0045166 , -0.00442505, -0.00427246,
           -0.00411987, -0.00378418, -0.003479  , -0.00314331, -0.00259399,
           -0.00213623, -0.00146484, -0.00082397, -0.00030518,  0.00027466,
            0.00076294,  0.00146484,  0.00192261,  0.00247192,  0.00314331])
    
    #y = np.power(X, 1) + np.random.rand(20)*3
    X2 = np.vstack([X, np.ones(len(X))]).T
    w = np.linalg.lstsq(X2, y)[0]
    
    
    plt.plot(X, y, 'red')
    plt.plot(X, X.dot(w[0])+w[1], 'blue')
    plt.show()
    

    Plot3