My data set looks something like
I am trying 2 functions to clean up the df - to first remove all equal combos in the df - meaning if the same ids are combo together like row 1 and row 3 it removes that but then when I try the second function that then removes any duplicate in each column - it runs with no error but the actual duplicates are not being removed?
def remove_dup_combos(df):
u = df.filter(like='id').values
m = pd.DataFrame(np.sort(u, axis=1)).duplicated()
df = df[~m]
return df
def remove_dups(df):
df = df = df.drop_duplicates(['id1', 'id2'])
return df
I believe you need if need remove duplicates by both columns:
df = df.drop_duplicates(['id1', 'id2'])
Your solution is different - remove duplicates separately first looking by first and then by second column:
df = df.drop_duplicates(['id1'], inplace = False)
df = df.drop_duplicates(['id2'], inplace = False)
Parameter inplace = False
is by default in DataFrame.drop_duplicates
, so should be removed:
df = df.drop_duplicates(['id1'])
df = df.drop_duplicates(['id2'])