I'm making a very basic plot. I have a csv data set that looks like this:
1,280.6
2,280.2
3,276.6
4,279.6
5,277.4
6,279.4
7,274.2
8,278.2
9,276.4
10,279.4
11,274.6
12,276.2
13,274.4
14,277.8
and I am plotting it with matplotlib like this:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('dataset.csv', delimiter=',',header=None,names=['x','y'])
plt.plot(df['x'], df['y'], label='',color=current_palette)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Title')
plt.show()
which gives this: a pretty graph
From both my knowledge and from previous answers I've found on here, I know how to calculate line of best fit when I am plotting a given equation or a range or similar. But what would be the best way to find a line of best fit for a given set of data?
Thanks a lot!
For finding the line of best fit, I would recommend using scipy's linear regression module.
from scipy.stats import linregress
slope, intercept, r_value, p_value, std_err = linregress(df['x'], df['y'])
Now that you have the slope and intercept, you can plot the line of best fit.