I have a parent_df and a child_df as shown below.
parent_df:
x y colA
x1 y1 A1
x1 y2 A2
x2 y1 A3
x2 y2 A4
child_df:
p q colB colC
p1 q1 B1 C1
p1 q2 B2 C2
p2 q1 B3 C3
p2 q2 B4 C4
I want to either modify parent_df or create a new parent_df by putting child_df into parent_df at a particular row in parent_df (x2, y1) so that:
parent_df:
x y p q colA colB colC
x1 y1 A1 NA NA
x1 y2 A2 NA NA
x2 y1 p1 q1 A3 B1 C1
p1 q2 A3 B2 C2
p2 q1 A3 B3 C3
p2 q2 A3 B4 C4
x2 y2 A4 NA NA
Is there a way to do this?
I think you need merge
with sort_index
:
print (parent_df)
colA
x y
x1 y1 A1
y2 A2
x2 y1 A3
y2 A4
print (child_df)
colB colC
p q
p1 q1 B1 C1
q2 B2 C2
p2 q1 B3 C3
q2 B4 C4
#create new columns
child_df['x'] = 'x2'
child_df['y'] = 'y1'
#set index by new columns
child_df = child_df.reset_index().set_index(['x','y'])
print (child_df)
p q colB colC
x y
x2 y1 p1 q1 B1 C1
y1 p1 q2 B2 C2
y1 p2 q1 B3 C3
y1 p2 q2 B4 C4
df = pd.merge(parent_df, child_df, left_index=True, right_index=True, how='outer')
#replace NaN in p. q columns with '', append and sort index
df = df.fillna({'p':'','q':''}).set_index(['p','q'], append=True).sort_index()
print (df)
colA colB colC
x y p q
x1 y1 A1 NaN NaN
y2 A2 NaN NaN
x2 y1 p1 q1 A3 B1 C1
q2 A3 B2 C2
p2 q1 A3 B3 C3
q2 A3 B4 C4
y2 A4 NaN NaN