I have this dataset:
thedf = pd.DataFrame({'a':[10,20,0],'b':[9,16,15]})
thedf
a b
0 10 9
1 20 16
2 0 15
And I want to create a new column using assign in a pandas chain. To avoid a division by zero, I put a conditional inside a lambda. I tried this code:
thedf.assign(division = lambda x: x['b']/x['a'] if x['a'] != 0 else 0)
But it returns an error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
The expected result is:
a b division
0 10 9 0.9
1 20 16 0.8
2 0 15 0
Please, this question is related to method chaining in pandas, and I expect the answer using assign
, because I use method chaining for data cleaning of more complex datasets.
x['a'] != 0
is comparing all values in the series, that's why you are seeing the ValueError
.
To do the comparison element-wise, you can use numpy.where
:
thedf.assign(division=lambda x: np.where(x["a"] != 0, x["b"] / x["a"], 0))
a b division
0 10 9 0.9
1 20 16 0.8
2 0 15 0.0
You can also use apply
on axis=1
to get the same result using the condition proposed in the question.
thedf.apply(lambda x: x["b"] / x["a"] if x["a"] != 0 else 0, axis=1)