Search code examples
pythonpandasmatplotlibtrendline

How to make trend line go through the origin while plotting its R2 value - python


I am working with a dataframe df which looks like this:

index       var1      var2      var3
0           0.0       0.0       0.0 
10          43940.7   2218.3    6581.7
100         429215.0  16844.3   51682.7

I wanted to plot each variable, plot their trend line forced to the origin, calculate and plot the R2 value.

I kind of found what I wanted in this post however the trend line doesn't go through the origin and I can't find a way to make it work.

I tried to manually modify the values of the first point of the trend line but the result doesn't seem good.

for var in df.columns[1:]:
    fig, ax = plt.subplots(figsize=(10,7))
    
    x = df.index
    y = df[var]
    
    z = numpy.polyfit(x, y, 1)
    p = numpy.poly1d(z)
    pylab.plot(x,p(x),"r--")
    
    plt.plot(x,y,"+", ms=10, mec="k")
    z = np.polyfit(x, y, 1)
    y_hat = np.poly1d(z)(x)
    y_hat[0] = 0     ###--- Here I tried to replace the first value with 0 but it doesn't seem right to me.

    plt.plot(x, y_hat, "r--", lw=1)
    text = f"$y={z[0]:0.3f}\;x{z[1]:+0.3f}$\n$R^2 = {r2_score(y,y_hat):0.3f}$"
    plt.gca().text(0.05, 0.95, text,transform=plt.gca().transAxes, fontsize=14, verticalalignment='top')
    

Is there any way of doing it? Any help would be greatly appreciated.


Solution

  • You could use Scipy and curve_fit for that. Determine your trendline to be y=ax so it goes through the origin.

    import matplotlib.pyplot as plt
    from scipy.optimize import curve_fit
    
    def func(x, a):
        return a * x
    
    xdata = (0,10,20,30,40)
    ydata = (0,12,18,35,38)
    
    popt, pcov = curve_fit(func, xdata, ydata)
    plt.scatter(xdata, ydata)
    plt.plot(xdata, func(xdata, popt),"r--")
    plt.show()
    

    plot