Search code examples
pythonpandasdummy-variable

pythonic way of making dummy column from sum of two values


I have a dataframe with one column called label which has the values [0,1,2,3,4,5,6,8,9]. I would like to make dummy columns out of this, but I would like some labels to be joined together, so for example I want dummy_012 to be 1 if the observation has either label 0, 1 or 2.

If i use the command df2 = pd.get_dummies(df, columns=['label']), it would create 9 columns, 1 for each label.

I know I can use df2['dummy_012']=df2['dummy_0']+df2['dummy_1']+df2['dummy_2'] after that to turn it into one joint column, but I want to know if there's a more pythonic way of doing it (or some function where i can just change the parameters to the joins).


Solution

  • Maybe this approach can give a idea:

    groups = ['012', '345', '6789']
    for gp in groups:
        df.loc[df['Label'].isin([int(x) for x in gp]), 'Label_Group'] = f'dummies_{gp}'
    

    Output:

       Label   Label_Group
    0      0   dummies_012
    1      1   dummies_012
    2      2   dummies_012
    3      3   dummies_345
    4      4   dummies_345
    5      5   dummies_345
    6      6  dummies_6789
    7      8  dummies_6789
    8      9  dummies_6789
    

    And then apply dummy:

    df_dummies = pd.get_dummies(df['Label_Group'])
       dummies_012  dummies_345  dummies_6789
    0            1            0             0
    1            1            0             0
    2            1            0             0
    3            0            1             0
    4            0            1             0
    5            0            1             0
    6            0            0             1
    7            0            0             1
    8            0            0             1