I have a time series of daily potential evaporation [mm/day] going back 11 years, but I need a time series going back to 1975. What I would like to do is calculate a "normal"/mean year from the data I have, and fill that into a time series with daily values all the way back to 1975.
I tried reindexing and resample into that df, but it didn't do the trick.
Below are some sample data:
epot [mm]
tid
2011-01-01 00:00:00+00:00 0.3
2011-01-02 00:00:00+00:00 0.2
2011-01-03 00:00:00+00:00 0.1
2011-01-04 00:00:00+00:00 0.1
2011-01-05 00:00:00+00:00 0.1
...
2021-12-27 00:00:00+00:00 0.1
2021-12-28 00:00:00+00:00 0.1
2021-12-29 00:00:00+00:00 0.1
2021-12-30 00:00:00+00:00 0.1
2021-12-31 00:00:00+00:00 0.1
epot [mm]
count 4018.000000
mean 1.688477
std 1.504749
min 0.000000
25% 0.300000
50% 1.300000
75% 2.800000
max 5.900000
The plot shows the daily values, it shows that there isn't a lot of difference from year to year, hence using a mean year for all the years prior would probably be just fine.
EDIT:
I have managed to calculate a normalised year of all my data, using both min, mean, 0.9 quantile and max. Which is really useful. But I still struggle to take these values and putting them in a time series stretching over several years.
I used the groupby function to get this far.
df1 = E_pot_d.groupby([E_pot_d.index.month, E_pot_d.index.day]).agg(f)
df2 = df1.rolling(30, center=True, min_periods=10).mean().fillna(method='bfill')
df2
Out[75]:
epot [mm]
min mean q0.90 max
tid tid
1 1 0.046667 0.161818 0.280000 0.333333
2 0.043750 0.165341 0.281250 0.337500
3 0.047059 0.165775 0.282353 0.341176
4 0.044444 0.169697 0.288889 0.344444
5 0.042105 0.172249 0.300000 0.352632
... ... ... ...
12 27 0.020000 0.137273 0.240000 0.290000
28 0.021053 0.138278 0.236842 0.289474
29 0.022222 0.138889 0.238889 0.288889
30 0.017647 0.139572 0.241176 0.294118
31 0.018750 0.140909 0.237500 0.293750
[366 rows x 4 columns]
If you want to take the daily average of the current years and project it back to 1975, you can try this:
s = pd.date_range("1975-01-01", "2010-12-31")
extrapolated = (
df.groupby(df.index.dayofyear)
.mean()
.join(pd.Series(s, index=s.dayofyear, name="tid"), how="outer")
.set_index("tid")
.sort_index()
)
# Combine the 2 data setes
result = pd.concat([extrapolated, df])
Note that this algorithm will give you the same value for Jan 1, 1975 and Jan 1, 1976, and Jan 1, 1977, etc. since they are the average of all Jan 1s from 2011 to 2021.