I have a dataframe with timeindex. There is a period (cycle) that starts every august. I want to calculate the difference between the value of each month and the value that was on the previous august (the beginning of its period). The aim is to know how much the values have changed during each cycle.
This is an small example of the data:
import pandas as pd
import numpy as np
df = pd.DataFrame({'date': ['2022-01-01 00:00:00', '2022-02-01 00:00:00',
'2021-03-01 00:00:00', '2021-04-01 00:00:00',
'2021-05-01 00:00:00', '2021-06-01 00:00:00',
'2021-07-01 00:00:00', '2021-08-01 00:00:00',
'2021-09-01 00:00:00', '2021-10-01 00:00:00',
'2021-11-01 00:00:00', '2021-12-01 00:00:00',
'2022-01-01 00:00:00', '2022-02-01 00:00:00',
'2022-03-01 00:00:00', '2022-04-01 00:00:00',
'2022-05-01 00:00:00', '2022-06-01 00:00:00',
'2022-07-01 00:00:00', '2022-08-01 00:00:00',
'2022-09-01 00:00:00', '2022-10-01 00:00:00',
'2022-11-01 00:00:00', '2023-12-01 00:00:00',
'2023-01-01 00:00:00', '2023-02-01 00:00:00',
'2023-03-01 00:00:00', '2023-04-01 00:00:00',
'2023-05-01 00:00:00', '2023-06-01 00:00:00',
],
'value1': [0.452762281,0.372262281,0.513928948,0.447762281,
0.377095615,0.355095615,0.271428948,0.291762281,
0.476762281,0.335928948,0.280428948,0.283762281,
0.322928948,0.287262281,0.316928948,0.209262281,
0.407928948,0.254262281,0.232095615,0.264262281,
0.076095615,-0.025237719,-0.042237719,-0.094904385,
0.017428948,-0.036071052,-0.094071052,-0.071404385,
0.008095615,-0.141571052],
'value2': [9.6,8,8.4,6.2,6.2,6,3.9,8.5,8.3,5.3,5.6,5.3,
6.2,6.3,6.9,4.8,6.7,3.6,3,4.6,2.3,1.3,1,0.3,
1.6,0.4,1.5,1.4,2.2,1.2]})
df = df.set_index('date')
I have tried to get a dataframe with the same structure but all the values have the value that was on its corresponding august, and then I just need to subtract one dataframe from the other. But I couldn't find a way to do it neither.
Thank you vary much in advance.
If I understand you correctly, you can use .groupby()
and then calculate the difference in each group:
df.index = pd.to_datetime(df.index)
tmp = (df.index.month == 8).cumsum()
out = (
df[tmp != 0]
.groupby(tmp[tmp > 0], group_keys=False)
.apply(lambda x: x[["value1", "value2"]] - x[["value1", "value2"]].iloc[0])
).add_prefix('calculated_')
x = df.join(out)
print(x)
Prints:
value1 value2 calculated_value1 calculated_value2
date
2021-03-01 0.513929 8.4 NaN NaN
2021-04-01 0.447762 6.2 NaN NaN
2021-05-01 0.377096 6.2 NaN NaN
2021-06-01 0.355096 6.0 NaN NaN
2021-07-01 0.271429 3.9 NaN NaN
2021-08-01 0.291762 8.5 0.000000 0.0
2021-09-01 0.476762 8.3 0.185000 -0.2
2021-10-01 0.335929 5.3 0.044167 -3.2
2021-11-01 0.280429 5.6 -0.011333 -2.9
2021-12-01 0.283762 5.3 -0.008000 -3.2
2022-01-01 0.452762 9.6 0.031167 -2.3
2022-01-01 0.322929 6.2 0.031167 -2.3
2022-02-01 0.372262 8.0 -0.004500 -2.2
2022-02-01 0.287262 6.3 -0.004500 -2.2
2022-03-01 0.316929 6.9 0.025167 -1.6
2022-04-01 0.209262 4.8 -0.082500 -3.7
2022-05-01 0.407929 6.7 0.116167 -1.8
2022-06-01 0.254262 3.6 -0.037500 -4.9
2022-07-01 0.232096 3.0 -0.059667 -5.5
2022-08-01 0.264262 4.6 0.000000 0.0
2022-09-01 0.076096 2.3 -0.188167 -2.3
2022-10-01 -0.025238 1.3 -0.289500 -3.3
2022-11-01 -0.042238 1.0 -0.306500 -3.6
2023-01-01 0.017429 1.6 -0.246833 -3.0
2023-02-01 -0.036071 0.4 -0.300333 -4.2
2023-03-01 -0.094071 1.5 -0.358333 -3.1
2023-04-01 -0.071404 1.4 -0.335667 -3.2
2023-05-01 0.008096 2.2 -0.256167 -2.4
2023-06-01 -0.141571 1.2 -0.405833 -3.4
2023-12-01 -0.094904 0.3 -0.359167 -4.3