Search code examples
pythonpandasdataframerowssubtraction

Pandas: Subtracting a dataframe from a smaller dataframe: Start again when out of rows


I got two dataframes like this

df1

    date    value
0   2019-Jan    200
1   2019-Feb    200
2   2019-Mrz    300
3   2019-Apr    300
4   2019-Mai    400
5   2019-Jun    400
6   2019-Jul    500
7   2019-Aug    500
8   2019-Sep    600
9   2019-Okt    600
10  2019-Nov    700
11  2019-Dez    700

df2

    date    value
0   2020-Jan    100
1   2020-Feb    200
2   2020-Mrz    300
3   2020-Apr    100
4   2020-Mai    200 
5   2020-Jun    300
6   2020-Jul    100
7   2020-Aug    200
8   2020-Sep    300
9   2020-Okt    100
10  2020-Nov    200
11  2020-Dez    300
12  2021-Jan    100
13  2021-Feb    200
14  2021-Mrz    300
15  2021-Apr    100
16  2021-Mai    200
17  2021-Jun    300
18  2021-Jul    100
19  2021-Aug    200
20  2021-Sep    300

What I would like to get is a new dataframe with the difference of each month in df2 to the corresponding month in df1. It should look like this

output

    date    value
0   2020-Jan    -100
1   2020-Feb    0
2   2020-Mrz    0
3   2020-Apr    -200
4   2020-Mai    -200
5   2020-Jun    -100
6   2020-Jul    -400
7   2020-Aug    -300
8   2020-Sep    -300
9   2020-Okt    -500
10  2020-Nov    -500
11  2020-Dez    -400
12  2021-Jan    -100
13  2021-Feb    0
14  2021-Mrz    0
15  2021-Apr    -200
16  2021-Mai    -200
17  2021-Jun    -100
18  2021-Jul    -400
19  2021-Aug    -300
20  2021-Sep    -300

As df2 gets bigger over time (when the October, November, December values come in), I would like to have a solution that keeps working no matter what the number of rows in df2 is. I was wondering if there is a way to subtract df1 from the first twelve rows of df2 and then start again with rows 13 and following.

(Obviously, in the real data the numbers are not distributed evenly as in this example)


Solution

  • new_col = df2['value'].sub(df2['date'].str.split('-').str[-1].map(dict(zip(df1['date'].str.split('-').str[-1], df1['value']))))
    

    Output:

    >>> new_col
    0    -100
    1       0
    2       0
    3    -200
    4    -200
    5    -100
    6    -400
    7    -300
    8    -300
    9    -500
    10   -500
    11   -400
    12   -100
    13      0
    14      0
    15   -200
    16   -200
    17   -100
    18   -400
    19   -300
    20   -300
    dtype: int64