Horizontally compare two dataframes and inset blank rows for differences. I need to see blank rows on the second dataframe where there are no id values
df1 =
user id Amount
John 1 1000
Tom 2 2000
Tom 2 3000
Claire 3 4000
Mary 4 5000
df2 =
user id Amount
John 1 1000
Tom 2 2000
Claire 3 4000
Mary 4 5000
To Horizontally compare I am using
pd.concat([df1, df2],
axis=1,
Result=
user id Amount user id Amount
John 1 1000 John 1 1000
Tom 2 2000 Tom 2 2000
Tom 2 3000 Claire 3 4000
Claire 3 4000 Mary 4 5000
Mary 4 5000
What I expect to happen:
user id Amount user id Amount
John 1 1000 John 1 1000
Tom 2 2000 Tom 2 2000
Tom 2 3000
Claire 3 4000 Claire 3 4000
Mary 4 5000 Mary 4 5000
This worked for me.
First I created a data frame to find the duplicated IDs
duplicate = df1[df1['id'].duplicated()]
Then I looked up this new dataframe to determine what IDs I needed to insert an empty line after
df2= df2.assign(result=df2['id'].isin(duplicate['id']).astype(int))
I created a result column to identify the rows where I needed to insert the empty line
a = (df2['result'] == 1)
df3 = df2.copy() #make a copy because we want to be safe here
for i in df2.loc[a].index:
empty_row = pd.DataFrame([], index=[i]) #creating the empty data
j = i + 1 #just to get things easier to read
df3 = pd.concat([df3.ix[:i], empty_row, df3.ix[j:]], sort=False) #slicing the df
df3 = df3.reset_index(drop=True,) #reset the index
Once the empty rows are inserted on df3 then I used pd.concat to display both dfs side by side
df_all =pd.concat([df1, df3], axis=1, sort=False)