Hello I am working on pandas dataframe and I want to create a column combining multiple columns and applying condition on them and I am looking for a smart way to do it.
Suppose the data frame looks as
A B C D
1 0 0 0
0 1 0 0
0 0 1 0
1 0 1 0
1 1 1 0
0 0 1 1
My output column should be as below
A B C D Output_col
1 0 0 0 A
0 1 0 0 B
0 0 1 0 C
1 0 1 0 A_C
1 1 1 0 A_B_C
0 0 1 1 C_D
I can certainly achieve this using below code but then I have to do it for every column.
test['Output_col'] = test.A.apply(lambda x: A if x > 0 else 0)
I was wondering if there is a way where I could achieve this without applying to every column if I have very large number of columns.
Thanks in advance !!
Use DataFrame.apply
+ join
.
Select column names using x.index
(
note that axis = 1
is used) + boolean indexing
with Series.eq
to filter the selected columns :
test['Output_col']=test.apply(lambda x: '_'.join(x.index[x.eq(1)]),axis=1)
print(test)
A B C D Output_col
0 1 0 0 0 A
1 0 1 0 0 B
2 0 0 1 0 C
3 1 0 1 0 A_C
4 1 1 1 0 A_B_C
5 0 0 1 1 C_D
To apply only a list of columns:
my_list_columns=['enter element of your list']
test['Output_col']=test[my_list_columns].apply(lambda x: '_'.join(x.index[x.eq(1)]),axis=1)
print(test)
case to all columns is 0
my_list_columns=['A','B','C','D']
df['Output_col']=df[my_list_columns].apply(lambda x: '_'.join(x.index[x.eq(1)]) if x.eq(1).any() else 'no_value',axis=1)
print(df)
A B C D Output_col
0 1 0 0 0 A
1 0 0 0 0 no_value
2 0 0 1 0 C
3 1 0 1 0 A_C
4 1 0 1 0 A_C
5 0 0 1 1 C_D