Search code examples
pythonpandasdataframemergedrop-duplicates

how to drop duplicates after merging two dataframes?


I have two dataframes ,

A= 
ID compponent weight
12  Cap        0.4
12  Pump       183
12  label      0.05
14  cap        0.6 

B=
ID compponent_B weight_B
12  Cap_B       0.7
12  Pump_B       189
12  label      0.05


when i do merge of this two dataframes based on the ID as a key ,I get


ID component weight component_B  weight_B
12  Cap        0.4   Cap_B        0,7
12  Cap        0.4   Pump_B       189
12  Cap        0.4   label        0.05
12  Pump       183   Cap_B        0,7
12  Pump       183   Pump_B       189
14  Pump       183   label        0.05
...

I understand that the fact that i have One ID pour the 3 lines genrates 9 lines if i do the merge , but how can i only keep three lines without losing information ,if i do a drop_duplicates on Component i'll lose informations about component_B. i want something like :


ID component weight  component_B  weight_B
12  Cap        0.4    Cap_B        0,7
12  Pump       183    Pump_B       189
12  labeL      0,05   label        0.05


anyone know how to do it ?


Solution

  • you can create a column with a cumcount per ID to be able to merge on ID and this new column so like:

    dfm = dfA.assign(cc=dfA.groupby('ID').cumcount())\
             .merge(dfB.assign(cc=dfB.groupby('ID').cumcount()), 
                    on=['ID', 'cc'], how='outer')
    print (dfm)
       ID compponent  weight  cc compponent_B  weight_B
    0  12        Cap    0.40   0        Cap_B      0.70
    1  12       Pump  183.00   1       Pump_B    189.00
    2  12      label    0.05   2        label      0.05
    3  14        cap    0.60   0          NaN       NaN