Search code examples
pythonpython-2.7pandasmatplotlibbest-fit-curve

Line of best fit in Python for csv data set?


I'm making a very basic plot. I have a csv data set that looks like this:

1,280.6
2,280.2
3,276.6
4,279.6
5,277.4
6,279.4
7,274.2
8,278.2
9,276.4
10,279.4
11,274.6
12,276.2
13,274.4
14,277.8

and I am plotting it with matplotlib like this:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('dataset.csv', delimiter=',',header=None,names=['x','y'])

plt.plot(df['x'], df['y'], label='',color=current_palette)

plt.xlabel('x')
plt.ylabel('y')
plt.title('Title')
plt.show()

which gives this: a pretty graph

From both my knowledge and from previous answers I've found on here, I know how to calculate line of best fit when I am plotting a given equation or a range or similar. But what would be the best way to find a line of best fit for a given set of data?

Thanks a lot!


Solution

  • For finding the line of best fit, I would recommend using scipy's linear regression module.

    from scipy.stats import linregress
    slope, intercept, r_value, p_value, std_err = linregress(df['x'], df['y'])
    

    Now that you have the slope and intercept, you can plot the line of best fit.