Search code examples
pythonnumpymatplotlibregression

How to plot a regression line


I cannot make a proper regression line. My a1 value is supposed to be positive, but it is negative. If I skip the mask part then my a1, b1, c1 values become NaN.

kwargs = dict(delimiter = '\t',\
         skip_header = 0,\
         missing_values = 'NaN',\
         converters = {0:matplotlib.dates.strpdate2num('%d-%m-%Y %H:%M')},\
         dtype = float,\
         names = True,\
         )

ratingcats = np.genfromtxt('C:\Users\ker\Documents\Discharge_and_stageheight_Catsop.txt',**kwargs)

dis_rat = ratingcats['discharge']   #change names of columns
stage_rat = ratingcats['stage']

#create regression line and mask NaN
dis_ratM = np.ma.masked_array(dis_rat,mask=np.isnan(dis_rat)).compressed()
stage_ratM = np.ma.masked_array(stage_rat,mask=np.isnan(dis_rat)).compressed()

a1,b1,c1 = polyfit(dis_ratM, stage_ratM, 2)

discharge_pred = polyval([a1,b1,c1],stage_ratM)

print (a1,b1,c1)

#create scatterplot

matplotlib.pyplot.scatter(stage_rat,dis_rat,color='red',label='Rating curve')
matplotlib.pyplot.plot(stage_ratM,discharge_pred,'r-',label='regression line')
matplotlib.pyplot.show()

Solution

  • After going through your code a second time I noticed that the arguments for polyfit are in the wrong order. The signature of polyfit is numpy.polyfit(x, y, deg).

    Try using

    a1,b1,c1 = polyfit(stage_ratM, dis_ratM, 2)
    

    (Note the swapped order of stage_ratMand dist_ratM) and see if it works now.