I have many days of hourly mean temperatures. These have all been normalised so that the minimum and maximum daily temperature equal 0 and 1 respectively, with each of the hourly mean temperatures being scaled to this range.
df
[out]:
Date_time 00:00 01:00 02:00 03:00 04:00 05:00 06:00 07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00 Max Min
0 2019-02-03 0.0875 0.0868 0.0440 0.0120 0.0108 0.0461 0.0961 0.2787 0.4908 0.6854 0.7379 0.8615 0.9284 0.8488 0.7711 0.2200 0.1617 0.2376 0.2211 0.1782 0.1700 0.1736 0.1174 0.1389 25.7 17.9
1 2019-03-07 0.0432 0.0432 0.0126 0.0011 0.0054 0.0065 0.0121 0.0592 0.2799 0.4322 0.7461 0.7475 0.8130 0.8599 0.6245 0.4815 0.4641 0.3502 0.2126 0.1878 0.1988 0.2114 0.2168 0.2292 21.6 17.9
2 2019-04-21 0.0651 0.0507 0.0324 0.0198 0.0703 0.0454 0.0457 0.2019 0.3700 0.5393 0.6593 0.7556 0.8682 0.9374 0.9593 0.9110 0.8721 0.6058 0.4426 0.3788 0.3447 0.3136 0.2564 0.1414 29.3 15.1
I wish to use SciPy optimize to fit a curve to the hourly pattern of temperature. I can't fit a curve to the full time series dataset because:
When plotting the full dataset up, this is the result:
It is to this that I'd like to fit a curve to, but all the examples on SciPy use datasets where there is only one value given for any time interval.
One option could be to calculate the mean for each hour and apply SciPy optimize to that. Is there another function I could use to apply a fit to the full dataset in Python?
As others mentioned there are many ways to fit the data. and it depends on your assumptions and what you are trying to do. Here is an example of doing a simple polynomial fit to your sample data.
First load and massage the data
from io import StringIO
from datetime import datetime as dt
import pandas as pd
import numpy as np
data = StringIO('''
Date_time 00:00 01:00 02:00 03:00 04:00 05:00 06:00 07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00 Max Min
0 2019-02-03 0.0875 0.0868 0.0440 0.0120 0.0108 0.0461 0.0961 0.2787 0.4908 0.6854 0.7379 0.8615 0.9284 0.8488 0.7711 0.2200 0.1617 0.2376 0.2211 0.1782 0.1700 0.1736 0.1174 0.1389 25.7 17.9
1 2019-03-07 0.0432 0.0432 0.0126 0.0011 0.0054 0.0065 0.0121 0.0592 0.2799 0.4322 0.7461 0.7475 0.8130 0.8599 0.6245 0.4815 0.4641 0.3502 0.2126 0.1878 0.1988 0.2114 0.2168 0.2292 21.6 17.9
2 2019-04-21 0.0651 0.0507 0.0324 0.0198 0.0703 0.0454 0.0457 0.2019 0.3700 0.5393 0.6593 0.7556 0.8682 0.9374 0.9593 0.9110 0.8721 0.6058 0.4426 0.3788 0.3447 0.3136 0.2564 0.1414 29.3 15.1
''')
df = pd.read_csv(data, sep = '\s+')
df2 = df.copy()
del df2['Date_time']
del df2['Max']
del df2['Min']
Extract the underlying hours and observatiobs, put into flattened arrays
hours = [dt.strptime(ts, '%H:%M').hour for ts in df2.columns]
raw_data = df2.values.flatten()
hours_rep = np.tile(hours, df2.values.shape[0])
Fit a polynomial of degree deg
(set below to 6). This will do a best-fit as input data has multiple observations for the same hour
deg = 6
p = np.polyfit(hours_rep, raw_data, deg = deg)
fit_data = np.polyval(p, hours)
Plot the results
import matplotlib.pyplot as plt
plt.plot(hours, df2.values.T, '.', label = 'obs')
plt.plot(hours, fit_data, 'o-', label = 'fit')
plt.legend(loc = 'best')
plt.show()