Search code examples
pythonpandasrowscalculationdivide

Pandas: Divide 1 dataframe of monthly data by 2nd dataframe with monthly averages


i have monthly data in a dataframe df1 that I need to divide by a second dataframe df2 to calculate the variability. My df1 looks like this below. I have tried to find an answer but cannot see how to divide each monthly value by it's unique monthly long-term (2019-2021) average that is given here.

index plant_name  year      month   wind_speed_ms
324  CAETITE III  2019      1       8.188548
325  CAETITE III  2019      2       5.506577
326  CAETITE III  2019      3       6.449032
327  CAETITE III  2019      4       6.077875
328  CAETITE III  2019      5       6.932110
329  CAETITE III  2019      6       8.905403
330  CAETITE III  2019      7       8.180067
331  CAETITE III  2019      8       9.252164
332  CAETITE III  2019      9       8.850250
333  CAETITE III  2019     10       8.013858
334  CAETITE III  2019     11       6.940972
335  CAETITE III  2019     12       7.244973
336  CAETITE III  2020      1       5.690524
337  CAETITE III  2020      2       5.782543
338  CAETITE III  2020      3       6.387433
339  CAETITE III  2020      4       7.097597
340  CAETITE III  2020      5       7.142957
341  CAETITE III  2020      6       8.653625
342  CAETITE III  2020      7       8.509987
343  CAETITE III  2020      8       9.164261
344  CAETITE III  2020      9       9.219514
345  CAETITE III  2020     10       7.110269
346  CAETITE III  2020     11       6.884597
347  CAETITE III  2020     12       6.144812
348  CAETITE III  2021      1       7.082836
349  CAETITE III  2021      2       5.706280
350  CAETITE III  2021      3       7.029341
351  CAETITE III  2021      4       6.676764
352  CAETITE III  2021      5       7.318804
353  CAETITE III  2021      6       7.403028
354  CAETITE III  2021      7       7.811183
355  CAETITE III  2021      8       9.243642

My second dataframe - df2 - looks like this and this is the monthly averages or the denominator:

df2:

     plant_name  month  wind_speed_ms
0   CAETITE III      1       6.764807
1   CAETITE III      2       6.624946
2   CAETITE III      3       6.661645
3   CAETITE III      4       7.127637
4   CAETITE III      5       7.361365
5   CAETITE III      6       8.416312
6   CAETITE III      7       8.746169
7   CAETITE III      8       8.915520
8   CAETITE III      9       8.824938
9   CAETITE III     10       8.037235
10  CAETITE III     11       6.810558
11  CAETITE III     12       6.435099

The answer needs to look like this:

index   plant_name  year    month   var
0   CAETITE III 2019    1   21.046
1   CAETITE III 2019    2   -16.881
2   CAETITE III 2019    3   -3.192
3   CAETITE III 2019    4   -14.728
4   CAETITE III 2019    5   -5.831
5   CAETITE III 2019    6   5.811
6   CAETITE III 2019    7   -6.473
7   CAETITE III 2019    8   3.776
8   CAETITE III 2019    9   0.287
9   CAETITE III 2019    10  -0.291
10  CAETITE III 2019    11  1.915
11  CAETITE III 2019    12  12.585
12  CAETITE III 2020    1   -15.880
13  CAETITE III 2020    2   -12.716
14  CAETITE III 2020    3   -4.116
15  CAETITE III 2020    4   -0.421
16  CAETITE III 2020    5   -2.967
17  CAETITE III 2020    6   2.820
18  CAETITE III 2020    7   -2.700
19  CAETITE III 2020    8   2.790
20  CAETITE III 2020    9   4.471
21  CAETITE III 2020    10  -11.533
22  CAETITE III 2020    11  1.087
23  CAETITE III 2020    12  -4.511
24  CAETITE III 2021    1   4.701
25  CAETITE III 2021    2   -13.867
26  CAETITE III 2021    3   5.520
27  CAETITE III 2021    4   -6.326
28  CAETITE III 2021    5   -0.578
29  CAETITE III 2021    6   -12.040
30  CAETITE III 2021    7   -10.690
31  CAETITE III 2021    8   3.680

Thank you,


Solution

  • You can match the month of df1 and df2 to get the average wind speed from df2 by using .map(). Then, divide the wind speed of df1 by the average wind speed just mapped, minus 1 and multiply by 100 to get the required values of var, as follows:

    df1['var'] = (df1['wind_speed_ms'] / df1['month'].map(df2.set_index('month')['wind_speed_ms']) - 1) * 100
    

    Result:

    print(df1)
    
         plant_name  year  month  wind_speed_ms        var
    0   CAETITE III  2019      1       8.188548  21.046291
    1   CAETITE III  2019      2       5.506577 -16.881179
    2   CAETITE III  2019      3       6.449032  -3.191599
    3   CAETITE III  2019      4       6.077875 -14.728051
    4   CAETITE III  2019      5       6.932110  -5.831188
    5   CAETITE III  2019      6       8.905403   5.811227
    6   CAETITE III  2019      7       8.180067  -6.472571
    7   CAETITE III  2019      8       9.252164   3.775932
    8   CAETITE III  2019      9       8.850250   0.286824
    9   CAETITE III  2019     10       8.013858  -0.290859
    10  CAETITE III  2019     11       6.940972   1.914880
    11  CAETITE III  2019     12       7.244973  12.585261
    12  CAETITE III  2020      1       5.690524 -15.880468
    13  CAETITE III  2020      2       5.782543 -12.715621
    14  CAETITE III  2020      3       6.387433  -4.116281
    15  CAETITE III  2020      4       7.097597  -0.421458
    16  CAETITE III  2020      5       7.142957  -2.966950
    17  CAETITE III  2020      6       8.653625   2.819679
    18  CAETITE III  2020      7       8.509987  -2.700405
    19  CAETITE III  2020      8       9.164261   2.789977
    20  CAETITE III  2020      9       9.219514   4.471148
    21  CAETITE III  2020     10       7.110269 -11.533394
    22  CAETITE III  2020     11       6.884597   1.087121
    23  CAETITE III  2020     12       6.144812  -4.510995
    24  CAETITE III  2021      1       7.082836   4.701228
    25  CAETITE III  2021      2       5.706280 -13.866770
    26  CAETITE III  2021      3       7.029341   5.519598
    27  CAETITE III  2021      4       6.676764  -6.325701
    28  CAETITE III  2021      5       7.318804  -0.578167
    29  CAETITE III  2021      6       7.403028 -12.039525
    30  CAETITE III  2021      7       7.811183 -10.690235
    31  CAETITE III  2021      8       9.243642   3.680346