python pandas numpy matplotlib trendline

matplotlib plotting trendline for pandas series

I have been trying to plot a trendline for a pandas series and have been successful although I am getting multiple trendlines whereas I am expecting only one.

Here is my code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_excel( 'cleaned_wind_turbine_data.xlsx' , index_col = 'Date' , parse_dates = True )
df_columns = df.columns.to_list()

df_1 = df.loc[  '2021-02-01 00:00:00' : '2021-02-28 23:50:00' ]

z1 = np.polyfit( df_1['Wind Speed (m/s)'] , df_1['Power ac (kW)'] , 6)
p1 = np.poly1d(z1)

plt.plot( df_1['Wind speed (m/s)'] , df_1['Power ac (kW)'] , 'bx' , 
         df_1['Wind speed (m/s)'] , p1(df_1['Wind speed (m/s)']) , 'r--' ,  markersize = 0.5 , linewidth = 1)

I am not getting an error but I am getting multiple trendlines, why is that?

Solution

You are getting "multiple" trendlines because your wind-speed column has a bunch of wind speeds that are in a jumbled order. For example, your windspeed array is probably something like

np.array([0.0,5.2,1.0,8.8])

matplotlib is going to draw a line between each of those points sequentially. Instead, for your best fit line, you need to come up with an ordered x that is equally spaced (something like np.array([0.0,0.1,0.2...)

To do that

x_trendline = np.arange(df_1['Wind Speed (m/s)'].min(), df_1['Wind Speed (m/s)'].max(), 0.05)
y_trendline = p1(x_trendline)

then when you plot,

plt.plot( df_1['Wind speed (m/s)'] , df_1['Power ac (kW)'] , 'bx' , 
          x_trendline, y_trendline , 'r--' ,  markersize = 0.5 , linewidth = 1)