I have data like the following:
Index ID data1 data2 ...
0 123 0 NaN ...
1 123 0 1 ...
2 456 NaN 0 ...
3 456 NaN 0 ...
...
I need to drop rows which have less than or equal to the information available in otherwise identical rows.
In the example above rows 0 and either 2 xor 3 should be removed.
My best attempt so far is the rather slow, and also non-functioning:
df.groupby(by='ID').fillna(method='ffill',inplace=True).fillna(method='bfill',inplace=True)
df.drop_duplicates(inplace=True)
How can I best accomplish this goal?
You're approach seems fine, just using in-place assignment was not working here (since you're assigning to a copy of the data), use:
df = df.groupby(by='ID', as_index=False).fillna(method='ffill').fillna(method='bfill')
df.drop_duplicates()
ID data1 data2
0 123 0.0 1.0
2 456 NaN 0.0