I am trying to merge multiple columns into a single column while dropping duplicates and dropping null values but keeping the rows. What I have:
Index | A | B | C
0 | nan | nan | nan
1 | 1 | 1 | nan
2 | x | 2 | nan
What I need:
Index | New_col
0 |
1 | 1
2 | x,2
What I have tried:
df= pd.DataFrame(np.array([['nan', 'nan', 'nan'], ['nan', 1, 1], ['nan', 'x', 2]]), columns=['A', 'B', 'C'])
df['a]=df[['a','b','c]].agg(', '.join, axis=1)
Also tried: .stack().unstack() and .join but I cannot get these to remove duplicates by row.
Not sure if set preserves the order of incoming elements, but we can chain some operations together
import pandas as pd
import numpy as np
from io import StringIO
df = pd.read_csv(StringIO(
"""A|B|C
||
1|1|
x|2|"""), sep="|", dtype="object")
>>> df.apply(lambda x: ",".join(set([str(i) for i in x if pd.notnull(i)])), axis=1)
0
1 1
2 x,2
dtype: object