Search code examples
python-3.xpandasdataframefeature-engineering

Combine columns in pandas to create a new column


Hello I am working on pandas dataframe and I want to create a column combining multiple columns and applying condition on them and I am looking for a smart way to do it.

Suppose the data frame looks as

A   B   C   D
1   0   0   0
0   1   0   0
0   0   1   0
1   0   1   0
1   1   1   0
0   0   1   1

My output column should be as below

A   B   C   D   Output_col
1   0   0   0   A
0   1   0   0   B
0   0   1   0   C
1   0   1   0   A_C
1   1   1   0   A_B_C
0   0   1   1   C_D

I can certainly achieve this using below code but then I have to do it for every column.

test['Output_col'] = test.A.apply(lambda x: A if x > 0 else 0)

I was wondering if there is a way where I could achieve this without applying to every column if I have very large number of columns.

Thanks in advance !!


Solution

  • Use DataFrame.apply + join. Select column names using x.index( note that axis = 1 is used) + boolean indexing with Series.eq to filter the selected columns :

    test['Output_col']=test.apply(lambda x: '_'.join(x.index[x.eq(1)]),axis=1)
    print(test)
    

       A  B  C  D Output_col
    0  1  0  0  0          A
    1  0  1  0  0          B
    2  0  0  1  0          C
    3  1  0  1  0        A_C
    4  1  1  1  0      A_B_C
    5  0  0  1  1        C_D
    

    To apply only a list of columns:

    my_list_columns=['enter element of your list']
    test['Output_col']=test[my_list_columns].apply(lambda x: '_'.join(x.index[x.eq(1)]),axis=1)
    print(test)
    

    case to all columns is 0

    my_list_columns=['A','B','C','D']
    df['Output_col']=df[my_list_columns].apply(lambda x: '_'.join(x.index[x.eq(1)])  if x.eq(1).any() else 'no_value',axis=1)
    print(df)
    
       A  B  C  D Output_col
    0  1  0  0  0          A
    1  0  0  0  0   no_value
    2  0  0  1  0          C
    3  1  0  1  0        A_C
    4  1  0  1  0        A_C
    5  0  0  1  1        C_D