Search code examples
pythoncsvjupyterpandasql

Is there any function to remove duplicate values in rows in jupyter?


I have a csv file. I need to remove the duplicates values under street_name. ex: I have multi hwy-1w! enter image description here

I used this query: joinedResult.groupby('roadId')['street_name'].apply(', '.join).reset_index().to_csv(f'./2{areaId}.csv', index = False)


Solution

  • If you want unique per row, this question might be of help. If you want to keep the data in the row and don't care about order of the string in the row after, maybe this could help:

    df['street_name'] = df['street_name'].apply(lambda x: ', '.join(set(x.split(', '))
    

    Converting to sets is always a nice way to remove duplicates.

    If you need to preserve order, you can use a Counter. It will be slower than using sets though:

    from collections import Counter
    df['street_name'] = df['street_name'].apply(lambda x: ', '.join(Counter(x.split(', ')).keys()))