Search code examples
pandastime-series

Create extrapolated time series in Pandas from dataset


I have a time series of daily potential evaporation [mm/day] going back 11 years, but I need a time series going back to 1975. What I would like to do is calculate a "normal"/mean year from the data I have, and fill that into a time series with daily values all the way back to 1975.

I tried reindexing and resample into that df, but it didn't do the trick.

Below are some sample data:

                           epot [mm]
tid                                 
2011-01-01 00:00:00+00:00        0.3
2011-01-02 00:00:00+00:00        0.2
2011-01-03 00:00:00+00:00        0.1
2011-01-04 00:00:00+00:00        0.1
2011-01-05 00:00:00+00:00        0.1
                             ...
2021-12-27 00:00:00+00:00        0.1
2021-12-28 00:00:00+00:00        0.1
2021-12-29 00:00:00+00:00        0.1
2021-12-30 00:00:00+00:00        0.1
2021-12-31 00:00:00+00:00        0.1
         epot [mm]
count  4018.000000
mean      1.688477
std       1.504749
min       0.000000
25%       0.300000
50%       1.300000
75%       2.800000
max       5.900000

The plot shows the daily values, it shows that there isn't a lot of difference from year to year, hence using a mean year for all the years prior would probably be just fine.

enter image description here

EDIT:

I have managed to calculate a normalised year of all my data, using both min, mean, 0.9 quantile and max. Which is really useful. But I still struggle to take these values and putting them in a time series stretching over several years.

I used the groupby function to get this far.

df1 = E_pot_d.groupby([E_pot_d.index.month, E_pot_d.index.day]).agg(f)

df2 = df1.rolling(30, center=True, min_periods=10).mean().fillna(method='bfill')

df2
Out[75]: 
        epot [mm]                              
              min      mean     q0.90       max
tid tid                                        
1   1    0.046667  0.161818  0.280000  0.333333
    2    0.043750  0.165341  0.281250  0.337500
    3    0.047059  0.165775  0.282353  0.341176
    4    0.044444  0.169697  0.288889  0.344444
    5    0.042105  0.172249  0.300000  0.352632
          ...       ...       ...       ...
12  27   0.020000  0.137273  0.240000  0.290000
    28   0.021053  0.138278  0.236842  0.289474
    29   0.022222  0.138889  0.238889  0.288889
    30   0.017647  0.139572  0.241176  0.294118
    31   0.018750  0.140909  0.237500  0.293750

[366 rows x 4 columns]

Plot of normalised year


Solution

  • If you want to take the daily average of the current years and project it back to 1975, you can try this:

    s = pd.date_range("1975-01-01", "2010-12-31")
    extrapolated = (
        df.groupby(df.index.dayofyear)
        .mean()
        .join(pd.Series(s, index=s.dayofyear, name="tid"), how="outer")
        .set_index("tid")
        .sort_index()
    )
    
    # Combine the 2 data setes
    result = pd.concat([extrapolated, df])
    

    Note that this algorithm will give you the same value for Jan 1, 1975 and Jan 1, 1976, and Jan 1, 1977, etc. since they are the average of all Jan 1s from 2011 to 2021.