python-3.x pandas dataframe duplicates data-analysis

Removing Duplicate Values not working - expected bool?

My data set looks something like

I am trying 2 functions to clean up the df - to first remove all equal combos in the df - meaning if the same ids are combo together like row 1 and row 3 it removes that but then when I try the second function that then removes any duplicate in each column - it runs with no error but the actual duplicates are not being removed?

def remove_dup_combos(df):
u = df.filter(like='id').values
m = pd.DataFrame(np.sort(u, axis=1)).duplicated()
df = df[~m]

return df



def remove_dups(df):
   df = df = df.drop_duplicates(['id1', 'id2'])

  return df

Solution

I believe you need if need remove duplicates by both columns:

df = df.drop_duplicates(['id1', 'id2'])

Your solution is different - remove duplicates separately first looking by first and then by second column:

df = df.drop_duplicates(['id1'], inplace = False)
df = df.drop_duplicates(['id2'], inplace = False)

Parameter inplace = False is by default in DataFrame.drop_duplicates, so should be removed:

df = df.drop_duplicates(['id1'])
df = df.drop_duplicates(['id2'])