Search code examples
pythonpandasdataframegroup-by

Doing group calculations with two separate dataframes in python


I have two pandas dataframes like this:

df1= pd.DataFrame({'sub-group':['2020','2030','2040','2030','2040','2030','2040'], 
                      'group':['a', 'a', 'a', 'b', 'b', 'c', 'c'],
                      'value1':[12,11,41,33,66,22,20]})

      sub-group  group  value1
      2020       a      12
      2030       a      11
      2040       a      41
      2030       b      33
      2040       b      66
      2030       c      22
      2040       c      20

df2= pd.DataFrame({'sub-group':['2020','2030','2040', '2020', '2030','2040','2030','2040'], 
                      'group':['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c'],
                      'value2':[10,20,30,15,45,60,12,36]})

      sub-group group  value2
      2020      a      10
      2030      a      20
      2040      a      30
      2020      b      15
      2030      b      45
      2040      b      60
      2030      c      12
      2040      c      36

I want to find valu1/value2 for each group and sub-group. Note that the number of observations might not match in two dataframes. for example, we have 2020/b in df2 but not in df1. I those cases a nan or 0 would work.

I was thinking that it should be possible with pd.groupby but I don't know how it works with two dataframes. Thanks.


Solution

  • For align groups and sub-group create MultiIndex Series by DataFrame.set_index, then divide by Series.div, last for 3 columns DataFrame add Series.reset_index:

    df = (df1.set_index(['group','sub-group'])['value1']
             .div(df2.set_index(['group','sub-group'])['value2'])
             .reset_index(name='out'))
    print (df)
      group sub-group       out
    0     a      2020  1.200000
    1     a      2030  0.550000
    2     a      2040  1.366667
    3     b      2020       NaN
    4     b      2030  0.733333
    5     b      2040  1.100000
    6     c      2030  1.833333
    7     c      2040  0.555556