I have DataFrame in Python Pandas like below:
data types:
ID - int
TYPE - object
TG_A - int
TG_B - int
ID | TYPE | TG_A | TG_B |
---|---|---|---|
111 | A | 1 | 0 |
111 | B | 1 | 0 |
222 | B | 1 | 0 |
222 | A | 1 | 0 |
333 | B | 0 | 1 |
333 | A | 0 | 1 |
And I need to drop duplicates in above DataFrame, so as to:
So, as a result I need something like below:
ID | TYPE | TG_A | TG_B
----|------|------|-----
111 | A | 1 | 0
222 | A | 1 | 0
333 | B | 0 | 1
How can I do that in Python Pandas ?
You can use two boolean masks and groupby.idxmax
to get the first non matching value:
m1 = df['TYPE'].eq('B') & df['TG_A'].eq(1)
m2 = df['TYPE'].eq('A') & df['TG_B'].eq(1)
out = df.loc[(~(m1|m2)).groupby(df['ID']).idxmax()]
Output:
ID TYPE TG_A TG_B
0 111 A 1 0
3 222 A 1 0
4 333 B 0 1