Setup the dataframe:
import pandas as pd
import numpy as np
np.random.seed(99)
rows = 10
df = pd.DataFrame ({'A' : np.random.choice(range(0, 2), rows, replace = True),
'B' : np.random.choice(range(0, 2), rows, replace = True)})
df
A B
0 1 1
1 1 1
2 1 0
3 0 1
4 1 1
5 0 1
6 0 1
7 0 0
8 1 1
9 0 1
I would like to add a column 'C' with the value 'X' is df.A and df.B are both 0 and else value 'Y'.
I tried:
df.assign(C = lambda row: 'X' if row.A + row.B == 0 else 'Y')
but that does not work...
I found other ways to get my results but would like to use .assign
with a lambda function in this situation.
Any suggestions on how to get assign with lambda working?
lambda
You can do this vectorised:
import numpy as np
df['C'] = np.where(df['A'] + df['B'] == 0, 'X', 'Y')
The lambda
solution has no benefit here, but if you want it...
df = df.assign(C=np.where(df.pipe(lambda x: x['A'] + x['B'] == 0), 'X', 'Y'))
The bad way to use assign
+ lambda
:
df = df.assign(C=df.apply(lambda x: 'X' if x.A + x.B == 0 else 'Y', axis=1))
What's wrong with the bad way is you are iterating rows in a Python-level loop. It's often worse than a regular Python for
loop.
The first two solutions perform vectorised operations on contiguous memory blocks, and are processed more efficiently as a result.