Search code examples
python-3.xpandasdataframeduplicatesdata-analysis

Removing Duplicate Values not working - expected bool?


My data set looks something like

enter image description here

I am trying 2 functions to clean up the df - to first remove all equal combos in the df - meaning if the same ids are combo together like row 1 and row 3 it removes that but then when I try the second function that then removes any duplicate in each column - it runs with no error but the actual duplicates are not being removed?

def remove_dup_combos(df):
u = df.filter(like='id').values
m = pd.DataFrame(np.sort(u, axis=1)).duplicated()
df = df[~m]

return df



def remove_dups(df):
   df = df = df.drop_duplicates(['id1', 'id2'])

  return df

Solution

  • I believe you need if need remove duplicates by both columns:

    df = df.drop_duplicates(['id1', 'id2'])
    

    Your solution is different - remove duplicates separately first looking by first and then by second column:

    df = df.drop_duplicates(['id1'], inplace = False)
    df = df.drop_duplicates(['id2'], inplace = False)
    

    Parameter inplace = False is by default in DataFrame.drop_duplicates, so should be removed:

    df = df.drop_duplicates(['id1'])
    df = df.drop_duplicates(['id2'])