Search code examples
pythonpandasdataframemergemulti-index

Insert a pandas multi-index dataframe into another multi-index data frame at a particular location


I have a parent_df and a child_df as shown below.

parent_df:
x  y  colA
x1 y1 A1
x1 y2 A2
x2 y1 A3
x2 y2 A4

child_df:
p  q  colB colC
p1 q1 B1   C1
p1 q2 B2   C2
p2 q1 B3   C3
p2 q2 B4   C4

I want to either modify parent_df or create a new parent_df by putting child_df into parent_df at a particular row in parent_df (x2, y1) so that:

parent_df:
x  y  p  q  colA colB colC
x1 y1       A1   NA   NA
x1 y2       A2   NA   NA
x2 y1 p1 q1 A3   B1   C1
      p1 q2 A3   B2   C2
      p2 q1 A3   B3   C3
      p2 q2 A3   B4   C4
x2 y2       A4   NA   NA

Is there a way to do this?


Solution

  • I think you need merge with sort_index:

    print (parent_df)
          colA
    x  y      
    x1 y1   A1
       y2   A2
    x2 y1   A3
       y2   A4
    
    print (child_df)
          colB colC
    p  q           
    p1 q1   B1   C1
       q2   B2   C2
    p2 q1   B3   C3
       q2   B4   C4
    
    #create new columns
    child_df['x'] =  'x2'
    child_df['y'] =  'y1'
    #set index by new columns
    child_df = child_df.reset_index().set_index(['x','y'])
    print (child_df)
            p   q colB colC
    x  y                   
    x2 y1  p1  q1   B1   C1
       y1  p1  q2   B2   C2
       y1  p2  q1   B3   C3
       y1  p2  q2   B4   C4
    

    df = pd.merge(parent_df, child_df, left_index=True, right_index=True, how='outer')
    #replace NaN in p. q columns with '', append and sort index
    df = df.fillna({'p':'','q':''}).set_index(['p','q'], append=True).sort_index()
    print (df)
                colA colB colC
    x  y  p  q                
    x1 y1         A1  NaN  NaN
       y2         A2  NaN  NaN
    x2 y1 p1 q1   A3   B1   C1
             q2   A3   B2   C2
          p2 q1   A3   B3   C3
             q2   A3   B4   C4
       y2         A4  NaN  NaN