Search code examples
numpyapache-sparkibm-cloudlinear-regression

How does numpy polyfit work?


I've created the "Precipitation Analysis" example Jupyter Notebook in the Bluemix Spark service.

Notebook Link: https://console.ng.bluemix.net/data/notebooks/3ffc43e2-d639-4895-91a7-8f1599369a86/view?access_token=effff68dbeb5f9fc0d2df20cb51bffa266748f2d177b730d5d096cb54b35e5f0

So in In[34] and In[35] (you have to scroll a lot) they use numpy polyfit to calculate the trend for given temperature data. However, I do not understand how to use it.

Can somebody explain it?


Solution

  • The question has been answered on Developerworks:- https://developer.ibm.com/answers/questions/282350/how-does-numpy-polyfit-work.html

    I will try to explain each of this:-

    index = chile[chile>0.0].index => this statements gives out all the years which are indices in chile python series which are greater than 0.0.

     fit = np.polyfit(index.astype('int'), chile[index].values,1)
    

    This is polyfit function call which find out ploynomial fitting coefficient(slope and intercept) for the given x(years) and y(precipitation on year) values at index(years) supplied through the vectors.

     print "slope: " + str(fit[0])
    

    The below code simply plots the datapoints referenced to straight line to show the trend

     plt.plot(index, chile[index],'.')
    

    Perticularly in the below statement the second argument is actually straight line equation to represent y which is "y = mx + b" where m is the slope and b is intercept that we found out above using polyfit.

     plt.plot(index, fit[0]*index.astype('int') + fit[1], '-', color='red')
     plt.title("Precipitation Trend for Chile")
     plt.xlabel("Year")
     plt.ylabel("Precipitation (million cubic meters)")
     plt.show()
    

    I hope that helps.

    Thanks, Charles.