Unique values across columns row-wise in pandas with missing values

I have a dataframe like

import pandas as pd
import numpy as np

df = pd.DataFrame({"Col1": ['A', np.nan, 'B', 'B', 'C'],
                  "Col2": ['A', 'B', 'B', 'A', 'C'],
                  "Col3": ['A', 'B', 'C', 'A', 'C']})

I want to get the unique combinations across columns for each row and create a new column with those values, excluding the missing values.

The code I have right now to do this is

def handle_missing(s):
    
    return np.unique(s[s.notnull()])
    
def unique_across_rows(data):
    
    
    unique_vals = data.apply(handle_missing, axis = 1)
    
    # numpy unique sorts the values automatically
    merged_vals = unique_vals.apply(lambda x: x[0] if len(x) == 1 else '_'.join(x))
    
    return merged_vals

df['Combos'] = unique_across_rows(df)

This returns the expected output:

       Col1    Col2  Col3   Combos
  0       A       A     A       A
  1     NaN       B     B       B
  2       B       B     C       B_C
  3       B       A     A       A_B
  4       C       C     C       C

It seems to me that there should be a more vectorized approach that exists within Pandas to do this: how could I do that?

Solution

You can try a simple list comprehension which might be more efficient for larger dataframes:

df['combos'] = ['_'.join(sorted(k for k in set(v) if pd.notnull(k))) for v in df.values]

Or you can wrap the above list comprehension in a more readable function:

def combos():
    for v in df.values:
        unique = set(filter(pd.notnull, v))
        yield '_'.join(sorted(unique))

df['combos'] = list(combos())

  Col1 Col2 Col3 combos
0    A    A    A      A
1  NaN    B    B      B
2    B    B    C    B_C
3    B    A    A    A_B
4    C    C    C      C