Search code examples
pythondataframenumpymatplotlibscatter-plot

Plotting regression line with log y scale


I have two plots I want to show (the original data and then its regression line). Whenever I run this code, the regression line doesn't run through the data at all-- I think this has to do with plotting the original data on a log-scale for the y axis (I tried including this when running polyfit, but I'm still having issues).

a = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
b = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])

plt.scatter(a, b)
plt.yscale('log')
slope, intercept = np.polyfit(a, np.log(b), 1)
plt.plot(a, (slope*a)+intercept)
plt.show()

Solution

  • You are fitting log(b) = slope * a + intercept, which is equivalent to b = np.exp(slope*a + intercept).

    In matploltib, you either have to make the plot using a linear scale, whith log(b) as a variable:

    import numpy as np
    import matplotlib.pyplot as plt
    
    a = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
    b = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
    
    slope, intercept = np.polyfit(a, np.log(b), 1)
    
    plt.figure()
    plt.scatter(a, np.log(b))
    plt.plot(a, (slope*a)+intercept)
    plt.show() 
    

    In this case, you do not use plt.yscale('log') as your axis is already scaled with respect to log(b).

    On the other hand, you can plot the linear variables with a logarithmic scale:

    import numpy as np
    import matplotlib.pyplot as plt
    
    a = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
    b = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
    
    slope, intercept = np.polyfit(a, np.log(b), 1)
    
    plt.figure()
    plt.yscale('log')
    plt.scatter(a, b)
    plt.plot(a, np.exp((slope*a)+intercept))
    plt.show()