Search code examples
pythonpandasscipycurve-fittingscipy-optimize

Using SciPy optimize to fit one curve to many data series


I have many days of hourly mean temperatures. These have all been normalised so that the minimum and maximum daily temperature equal 0 and 1 respectively, with each of the hourly mean temperatures being scaled to this range.

df
[out]:
    Date_time   00:00   01:00   02:00   03:00   04:00   05:00   06:00   07:00   08:00   09:00   10:00   11:00   12:00   13:00   14:00   15:00   16:00   17:00   18:00   19:00   20:00   21:00   22:00   23:00   Max     Min
0   2019-02-03  0.0875  0.0868  0.0440  0.0120  0.0108  0.0461  0.0961  0.2787  0.4908  0.6854  0.7379  0.8615  0.9284  0.8488  0.7711  0.2200  0.1617  0.2376  0.2211  0.1782  0.1700  0.1736  0.1174  0.1389  25.7    17.9
1   2019-03-07  0.0432  0.0432  0.0126  0.0011  0.0054  0.0065  0.0121  0.0592  0.2799  0.4322  0.7461  0.7475  0.8130  0.8599  0.6245  0.4815  0.4641  0.3502  0.2126  0.1878  0.1988  0.2114  0.2168  0.2292  21.6    17.9
2   2019-04-21  0.0651  0.0507  0.0324  0.0198  0.0703  0.0454  0.0457  0.2019  0.3700  0.5393  0.6593  0.7556  0.8682  0.9374  0.9593  0.9110  0.8721  0.6058  0.4426  0.3788  0.3447  0.3136  0.2564  0.1414  29.3    15.1

I wish to use SciPy optimize to fit a curve to the hourly pattern of temperature. I can't fit a curve to the full time series dataset because:

  • I want the fit to be generic, and applicable to any given day
  • The data is discontinuous, meaning that there would be gaps in the dataset that would cause poor fit

When plotting the full dataset up, this is the result:

Scatter plot of normalised hourly mean temperature

It is to this that I'd like to fit a curve to, but all the examples on SciPy use datasets where there is only one value given for any time interval.

One option could be to calculate the mean for each hour and apply SciPy optimize to that. Is there another function I could use to apply a fit to the full dataset in Python?


Solution

  • As others mentioned there are many ways to fit the data. and it depends on your assumptions and what you are trying to do. Here is an example of doing a simple polynomial fit to your sample data.

    First load and massage the data

    from io import StringIO
    from datetime import datetime as dt
    import pandas as pd
    import numpy as np
    data = StringIO('''
        Date_time   00:00   01:00   02:00   03:00   04:00   05:00   06:00   07:00   08:00   09:00   10:00   11:00   12:00   13:00   14:00   15:00   16:00   17:00   18:00   19:00   20:00   21:00   22:00   23:00   Max     Min
    0   2019-02-03  0.0875  0.0868  0.0440  0.0120  0.0108  0.0461  0.0961  0.2787  0.4908  0.6854  0.7379  0.8615  0.9284  0.8488  0.7711  0.2200  0.1617  0.2376  0.2211  0.1782  0.1700  0.1736  0.1174  0.1389  25.7    17.9
    1   2019-03-07  0.0432  0.0432  0.0126  0.0011  0.0054  0.0065  0.0121  0.0592  0.2799  0.4322  0.7461  0.7475  0.8130  0.8599  0.6245  0.4815  0.4641  0.3502  0.2126  0.1878  0.1988  0.2114  0.2168  0.2292  21.6    17.9
    2   2019-04-21  0.0651  0.0507  0.0324  0.0198  0.0703  0.0454  0.0457  0.2019  0.3700  0.5393  0.6593  0.7556  0.8682  0.9374  0.9593  0.9110  0.8721  0.6058  0.4426  0.3788  0.3447  0.3136  0.2564  0.1414  29.3    15.1
    ''')
    df = pd.read_csv(data, sep = '\s+')
    df2 = df.copy()
    del df2['Date_time']
    del df2['Max']
    del df2['Min']
    

    Extract the underlying hours and observatiobs, put into flattened arrays

    hours = [dt.strptime(ts, '%H:%M').hour for ts in df2.columns]
    raw_data = df2.values.flatten()
    hours_rep = np.tile(hours, df2.values.shape[0])
    

    Fit a polynomial of degree deg (set below to 6). This will do a best-fit as input data has multiple observations for the same hour

    deg = 6
    p = np.polyfit(hours_rep, raw_data, deg = deg)
    fit_data = np.polyval(p, hours)
    

    Plot the results

    import matplotlib.pyplot as plt
    plt.plot(hours, df2.values.T, '.', label = 'obs')
    plt.plot(hours, fit_data, 'o-', label = 'fit')
    plt.legend(loc = 'best')
    plt.show()
    

    This is how it looks like enter image description here