Search code examples
pythonpandasdataframemergeconcatenation

Concatenating two Pandas dataframes while keeping the values from the first one for repeated cells


I have two dataframes: df1= and df2=

I want to concatenate the two dataframes at all rows and columns while the output has unique rows for the first two columns 'parameter' and 'date' and unique columns for the other columns.

Recently I asked a similar question here. Giving a try to the accepted solution there, I see an extra row for the date '2023-01-01':

code:

df1 = pd.DataFrame({ 'parameter' : ['A', 'B'], 'date' : ['2023-01-01', '2023-01-02'], 'column2' : ['A2_1', 'B2_2'], 'column3' : ['A3_1', 'B3_2'], 'column4' : ['A4_1', 'B4_2'], 'column5' : ['A5_1', 'B5_2'] })
df2 = pd.DataFrame({ 'parameter' : ['A', 'C'], 'date' : ['2023-01-01', '2023-01-03'], 'column4' : ['A4_1_1', 'C4_3'], 'column5' : ['A5_1_1', 'C5_3'], 'column6' : ['A6_2', 'C6_3'], 'column7' : ['A7_2', 'C7_3'] })
res = pd.concat([df1, df2]).sort_values(['parameter', 'date']).fillna('').reset_index(drop=True)
print(res)

Output:

  parameter        date column2 column3 column4 column5 column6 column7
0         A  2023-01-01    A2_1    A3_1    A4_1    A5_1                
1         A  2023-01-01                  A4_1_1  A5_1_1    A6_2    A7_2
2         B  2023-01-02    B2_2    B3_2    B4_2    B5_2                
3         C  2023-01-03                    C4_3    C5_3    C6_3    C7_3

But I expect to get the following dataframe as the resulting output:

df_result=

Please note that for the repeated cells, the value of the first dataframe should be kept.

Can anyone suggest a way to get the above output dataframe preferably without using loops and conditions?


Solution

  • You could try grouping them by the parameter and date and taking the first non-null value from each group.

    pd.concat([df1,df2]).sort_values(by=['parameter','date']).groupby(['parameter','date']).first().reset_index()