numpy apache-spark ibm-cloud linear-regression

How does numpy polyfit work?

I've created the "Precipitation Analysis" example Jupyter Notebook in the Bluemix Spark service.

Notebook Link: https://console.ng.bluemix.net/data/notebooks/3ffc43e2-d639-4895-91a7-8f1599369a86/view?access_token=effff68dbeb5f9fc0d2df20cb51bffa266748f2d177b730d5d096cb54b35e5f0

So in In[34] and In[35] (you have to scroll a lot) they use numpy polyfit to calculate the trend for given temperature data. However, I do not understand how to use it.

Can somebody explain it?

Solution

The question has been answered on Developerworks:- https://developer.ibm.com/answers/questions/282350/how-does-numpy-polyfit-work.html

I will try to explain each of this:-

index = chile[chile>0.0].index => this statements gives out all the years which are indices in chile python series which are greater than 0.0.

 fit = np.polyfit(index.astype('int'), chile[index].values,1)

This is polyfit function call which find out ploynomial fitting coefficient(slope and intercept) for the given x(years) and y(precipitation on year) values at index(years) supplied through the vectors.

 print "slope: " + str(fit[0])

The below code simply plots the datapoints referenced to straight line to show the trend

 plt.plot(index, chile[index],'.')

Perticularly in the below statement the second argument is actually straight line equation to represent y which is "y = mx + b" where m is the slope and b is intercept that we found out above using polyfit.

 plt.plot(index, fit[0]*index.astype('int') + fit[1], '-', color='red')
 plt.title("Precipitation Trend for Chile")
 plt.xlabel("Year")
 plt.ylabel("Precipitation (million cubic meters)")
 plt.show()

I hope that helps.

Thanks, Charles.