Search code examples
pythonpandasdataframejoinmulti-index

Joining two MultiIndex DataFrames with different number of index levels


I have a pandas dataframe:

                               emp    laborforce  emp_rate
occ statefip quarter                                      
10  1        0        6.561213e+06  7.017537e+06  0.934974
             4        8.580723e+06  9.114996e+06  0.941385
             8        8.588012e+06  9.102831e+06  0.943444
             12       2.093297e+06  2.220923e+06  0.942535
    2        0        6.561208e+06  7.017527e+06  0.934974

Now I want to merge into this dataset the average emp_rate for each (occ, statefip). I tried

df2 = df1.groupby(level=[0, 1])['emp_rate'].mean()
df2.name = 'emp_rate_mean'
df1.join(df2, how='inner')
NotImplementedError: merging with more than one level overlap on a multi-index is not implemented

Apparently, this type of join would work since pandas 0.14 if the second dataframe was single-indexed. It is not. What's the proper approach in this case?


Solution

  • use GroupBy.transform():

    In [102]: df['emp_rate_avg'] = df.groupby(level=[0, 1])['emp_rate'].transform('mean')
    
    In [103]: df
    Out[103]:
                                 emp  laborforce  emp_rate  emp_rate_avg
    occ  statefip quarter
    10.0 1.0      0        6561213.0   7017537.0  0.934974      0.940584
                  4        8580723.0   9114996.0  0.941385      0.940584
                  8        8588012.0   9102831.0  0.943444      0.940584
                  12       2093297.0   2220923.0  0.942535      0.940584
         2.0      0        6561208.0   7017527.0  0.934974      0.934974