Search code examples
pythonanalyticscategorical

I have a dataset where I need to convert some columns into single categorical variables & concatenating multi true values


I have a dataset with unique ID with columns which have several bool values for each ID. Therefore, I need to convert these columns into a single categorical variables concatenating multiple true bool values. And if out of mentioned bool ID have 3 true values we assign to categories as "Win"

ID BoolCol_1 BoolCol_2 BoolCol_3 BoolCol_4 Other Col 1 Other Col 2
1 1 2 2 1 x Y
2 2 1 1 1 A b

1 -> True 2 -> False

ID are unique.

I am not able to think in my head how to solve this puzzle


Solution

  • Use the following approach:

    bool_cols = ['BoolCol_1', 'BoolCol_2', 'BoolCol_3', 'BoolCol_4']
    cnts = df[bool_cols].stack().groupby(level=0).value_counts().unstack()[1]
    df['cat_col'] = pd.Series(np.where(cnts >= 3, 'W', 'L'), dtype='category')
    

    Now cat_col is categorical column with fixed values W (win), L (lose)


    In [229]: df
    Out[229]: 
       ID  BoolCol_1  BoolCol_2  BoolCol_3  BoolCol_4 Other Col 1 Other Col 2 cat_col
    0   1          1          2          2          1          x            Y       L
    1   2          2          1          1          1          A            b       W