I have the below dataframe and I need to subtract and find the delta between next quater and current quater plus next to next quater and current quater.
Input data:
Number Name Year Quater value
1 an 2018 1 2.5
2 bn 2018 1 1.5
1 an 2018 2 3.5
2 bn 2018 2 4.5
1 an 2018 3 4.5
1 an 2018 4 2.5
2 bn 2018 4 1.5
1 an 2019 1 5.5
2 bn 2019 1 1.5
Output: d_1
is the difference between next quater and current quater for same number and name, whereas d_2
is the diff between next to next quater and current quater for same number and name.
Number Name Year Quater value d_1 d_2
1 an 2018 1 2.5 1 2
2 bn 2018 1 1.5 3
1 an 2018 2 3.5 1 -1
2 bn 2018 2 4.5 -3
1 an 2018 3 4.5 -2 1
1 an 2018 4 2.5 3
2 bn 2018 4 1.5 0
1 an 2019 1 5.5
2 bn 2019 1 1.5
First, make sure the data is correctly sorted:
df = df.sort_values(by=['Year', 'Quater'])
Then use groupby
and diff
to calculate the differences between rows:
df['d_1'] = df.groupby(['Number', 'Name'])['value'].diff(periods=-1) * -1
df['d_2'] = df.groupby(['Number', 'Name'])['value'].diff(periods=-2) * -1
df = df.fillna(0.0)
Result:
Number Name Year Quater value d_1 d_2
1 an 2018 1 2.5 1.0 2.0
2 bn 2018 1 1.5 3.0 -0.0
1 an 2018 2 3.5 1.0 -1.0
2 bn 2018 2 4.5 -3.0 -3.0
1 an 2018 3 4.5 -2.0 1.0
1 an 2018 4 2.5 3.0 0.0
2 bn 2018 4 1.5 -0.0 0.0
1 an 2019 1 5.5 0.0 0.0
2 bn 2019 1 1.5 0.0 0.0
Edit:
If missing values should be considered as well, the best way is to resample
the dataframe first to add any missing rows.
First, add a new column date
and resample the dataframe:
df['date'] = pd.to_datetime(df['Year'].astype(str) + 'Q' + df['Quater'].astype(str))
df = df.set_index('date').groupby(['Number', 'Name']).resample('Q').first().drop(['Name', 'Number'], axis=1).reset_index()
Now we have:
Number Name date Year Quater value
1 an 2018-03-31 2018 1.0 2.5
1 an 2018-06-30 2018 2.0 3.5
1 an 2018-09-30 2018 3.0 4.5
1 an 2018-12-31 2018 4.0 2.5
1 an 2019-03-31 2019 1.0 5.5
2 bn 2018-03-31 2018 1.0 1.5
2 bn 2018-06-30 2018 2.0 4.5
2 bn 2018-09-30 NaN NaN NaN
2 bn 2018-12-31 2018 4.0 1.5
2 bn 2019-03-31 2019 1.0 1.5
Now apply the same groupby
and diff
as above as well as dropping extra rows, the date
column and sort the result:
df['d_1'] = df.groupby(['Number', 'Name'])['value'].diff(periods=-1) * -1
df['d_2'] = df.groupby(['Number', 'Name'])['value'].diff(periods=-2) * -1
df.dropna(subset=['Year']).fillna(0.0).sort_values(by=['Year', 'Quater']).drop('date', axis=1)
The fillna
can be skipped if you prefer to keep the NaNs.