I have two dataframes ,
A=
ID compponent weight
12 Cap 0.4
12 Pump 183
12 label 0.05
14 cap 0.6
B=
ID compponent_B weight_B
12 Cap_B 0.7
12 Pump_B 189
12 label 0.05
when i do merge of this two dataframes based on the ID as a key ,I get
ID component weight component_B weight_B
12 Cap 0.4 Cap_B 0,7
12 Cap 0.4 Pump_B 189
12 Cap 0.4 label 0.05
12 Pump 183 Cap_B 0,7
12 Pump 183 Pump_B 189
14 Pump 183 label 0.05
...
I understand that the fact that i have One ID pour the 3 lines genrates 9 lines if i do the merge , but how can i only keep three lines without losing information ,if i do a drop_duplicates on Component i'll lose informations about component_B. i want something like :
ID component weight component_B weight_B
12 Cap 0.4 Cap_B 0,7
12 Pump 183 Pump_B 189
12 labeL 0,05 label 0.05
anyone know how to do it ?
you can create a column with a cumcount
per ID to be able to merge
on ID and this new column so like:
dfm = dfA.assign(cc=dfA.groupby('ID').cumcount())\
.merge(dfB.assign(cc=dfB.groupby('ID').cumcount()),
on=['ID', 'cc'], how='outer')
print (dfm)
ID compponent weight cc compponent_B weight_B
0 12 Cap 0.40 0 Cap_B 0.70
1 12 Pump 183.00 1 Pump_B 189.00
2 12 label 0.05 2 label 0.05
3 14 cap 0.60 0 NaN NaN