I have been trying to plot a trendline for a pandas series and have been successful although I am getting multiple trendlines whereas I am expecting only one.
Here is my code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_excel( 'cleaned_wind_turbine_data.xlsx' , index_col = 'Date' , parse_dates = True )
df_columns = df.columns.to_list()
df_1 = df.loc[ '2021-02-01 00:00:00' : '2021-02-28 23:50:00' ]
z1 = np.polyfit( df_1['Wind Speed (m/s)'] , df_1['Power ac (kW)'] , 6)
p1 = np.poly1d(z1)
plt.plot( df_1['Wind speed (m/s)'] , df_1['Power ac (kW)'] , 'bx' ,
df_1['Wind speed (m/s)'] , p1(df_1['Wind speed (m/s)']) , 'r--' , markersize = 0.5 , linewidth = 1)
I am not getting an error but I am getting multiple trendlines, why is that?
You are getting "multiple" trendlines because your wind-speed column has a bunch of wind speeds that are in a jumbled order. For example, your windspeed array is probably something like
np.array([0.0,5.2,1.0,8.8])
matplotlib
is going to draw a line between each of those points sequentially. Instead, for your best fit line, you need to come up with an ordered x that is equally spaced (something like np.array([0.0,0.1,0.2...
)
To do that
x_trendline = np.arange(df_1['Wind Speed (m/s)'].min(), df_1['Wind Speed (m/s)'].max(), 0.05)
y_trendline = p1(x_trendline)
then when you plot,
plt.plot( df_1['Wind speed (m/s)'] , df_1['Power ac (kW)'] , 'bx' ,
x_trendline, y_trendline , 'r--' , markersize = 0.5 , linewidth = 1)