Search code examples
pythonmatplotlibplotlinear-regression

adding regression line to a plot given its intercept and slope


Using the following small dataset:

bill = [34,108,64,88,99,51]
tip =  [5,17,11,8,14,5]  

I calculated a best-fit regression line (by hand).

yi = 0.1462*x - 0.8188 #yi = slope(x) + intercept

I've plotted my original data using Matplotlib like this:

plt.scatter(bill,tip, color="black")
plt.xlim(20,120) #set ranges
plt.ylim(4,18)

#plot centroid point (mean of each variable (74,10))
line1 = plt.plot([74, 74],[0,10], ':', c="red")
line2 = plt.plot([0,74],[10,10],':', c="red")

plt.scatter(74,10, c="red")

#annotate the centroid point
plt.annotate('centroid (74,10)', xy=(74.1,10), xytext=(81,9),
        arrowprops=dict(facecolor="black", shrink=0.01),
        )

#label axes
plt.xlabel("Bill amount ($)")
plt.ylabel("Tip amount ($)")

#display plot
plt.show()

I am unsure how to get the regression line onto the plot itself. I'm aware that there are plenty of builtin stuff for quickly fitting and displaying best fit lines, but I did this as practice. I know I can start the line at points '0,0.8188' (the intercept), but I don't know how to use the slope value to complete the line (set the lines end points).

Given that for each increase on the x axis, the slope should increase by '0.1462'; for the line coordinates I tried (0,0.8188) for the starting point, and (100,14.62) for the end point. But this line does not pass through my centroid point. It just misses it.


Solution

  • The reasoning in the question partially correct. Having a function f(x) = a*x +b, you may take as first point the interception with the y axis (x=0) as (0, b) (or (0,-0.8188) in this case).
    Any other point on that line is given by (x, f(x)), or (x, a*x+b). So looking at the point at x=100 would give you (100, f(100)), plugging in: (100, 0.1462*100-0.8188) = (100,13.8012). In the case you describe in the question you just forgot to take the b into account.

    The following shows how to use that function to plot the line in matplotlib:

    import matplotlib.pyplot as plt
    import numpy as np
    
    bill = [34,108,64,88,99,51]
    tip =  [5,17,11,8,14,5]  
    plt.scatter(bill, tip)
    
    #fit function
    f = lambda x: 0.1462*x - 0.8188
    # x values of line to plot
    x = np.array([0,100])
    # plot fit
    plt.plot(x,f(x),lw=2.5, c="k",label="fit line between 0 and 100")
    
    #better take min and max of x values
    x = np.array([min(bill),max(bill)])
    plt.plot(x,f(x), c="orange", label="fit line between min and max")
    
    plt.legend()
    plt.show()
    

    enter image description here

    Of course the fitting can also be done automatically. You can obtain the slope and intercept from a call to numpy.polyfit:

    #fit function
    a, b = np.polyfit(np.array(bill), np.array(tip), deg=1)
    f = lambda x: a*x + b
    

    The rest in the plot would stay the same.