Dataframe columns contains string values in list. Dataframe needs to be transformed to have rows with unique lists in column 'Final'
I have dataframe as below,
string1 string2 Final
1 [abc,ncx] [qwe, rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana,grapes, apple]
3 [ncx,abc] [rty,qwe] [mango,apple]
4 [uio,pas,dfg] [zxc,vbg,dfv] [banana,grapes, apple]
5 [uio,dfg] [zxc,dfv] [banana, apple]
6 [ncx,abc] [rty,qwe] [mango,apple]
df['final'] column must drop duplicate lists and transform dataframe to contain unique list in 'final' column.
Desired output dataframe:
string1 string2 Final
1 [abc,ncx] [qwe, rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana,grapes, apple]
3 [ncx,abc] [rty,qwe] [mango,apple]
4 [uio,dfg] [zxc,dfv] [banana, apple]
Invert mask by ~
created by Series.duplicated
, but because list
s are not hashable first convert them to tuples and filter in boolean indexing
:
df = df[~df['Final'].apply(tuple).duplicated()]
print (df)
string1 string2 Final
1 [abc,ncx] [qwe,rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana, grapes, apple]
3 [ncx,abc] [rty,qwe] [mango, apple]
5 [uio,dfg] [zxc,dfv] [banana, apple]
If apple, mango
should be duplicate with mango, apple
(order is not important) change tuple
to frozenset
:
df = df[~df['Final'].apply(frozenset).duplicated()]
print (df)
string1 string2 Final
1 [abc,ncx] [qwe,rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana, grapes, apple]
5 [uio,dfg] [zxc,dfv] [banana, apple]