I have two dataframes: df1= and df2=
I want to concatenate the two dataframes at all rows and columns while the output has unique rows for the first two columns 'parameter' and 'date' and unique columns for the other columns.
Recently I asked a similar question here. Giving a try to the accepted solution there, I see an extra row for the date '2023-01-01':
code:
df1 = pd.DataFrame({ 'parameter' : ['A', 'B'], 'date' : ['2023-01-01', '2023-01-02'], 'column2' : ['A2_1', 'B2_2'], 'column3' : ['A3_1', 'B3_2'], 'column4' : ['A4_1', 'B4_2'], 'column5' : ['A5_1', 'B5_2'] })
df2 = pd.DataFrame({ 'parameter' : ['A', 'C'], 'date' : ['2023-01-01', '2023-01-03'], 'column4' : ['A4_1_1', 'C4_3'], 'column5' : ['A5_1_1', 'C5_3'], 'column6' : ['A6_2', 'C6_3'], 'column7' : ['A7_2', 'C7_3'] })
res = pd.concat([df1, df2]).sort_values(['parameter', 'date']).fillna('').reset_index(drop=True)
print(res)
Output:
parameter date column2 column3 column4 column5 column6 column7
0 A 2023-01-01 A2_1 A3_1 A4_1 A5_1
1 A 2023-01-01 A4_1_1 A5_1_1 A6_2 A7_2
2 B 2023-01-02 B2_2 B3_2 B4_2 B5_2
3 C 2023-01-03 C4_3 C5_3 C6_3 C7_3
But I expect to get the following dataframe as the resulting output:
Please note that for the repeated cells, the value of the first dataframe should be kept.
Can anyone suggest a way to get the above output dataframe preferably without using loops and conditions?
You could try grouping them by the parameter and date and taking the first non-null value from each group.
pd.concat([df1,df2]).sort_values(by=['parameter','date']).groupby(['parameter','date']).first().reset_index()