Search code examples
pythonpandas

Using lambda with conditional in pandas chain


I have this dataset:

thedf = pd.DataFrame({'a':[10,20,0],'b':[9,16,15]})
thedf
    a   b
0   10  9
1   20  16
2   0   15

And I want to create a new column using assign in a pandas chain. To avoid a division by zero, I put a conditional inside a lambda. I tried this code:

thedf.assign(division = lambda x: x['b']/x['a'] if x['a'] != 0 else 0)

But it returns an error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The expected result is:

    a   b   division
0   10  9   0.9
1   20  16  0.8
2   0   15  0

Please, this question is related to method chaining in pandas, and I expect the answer using assign, because I use method chaining for data cleaning of more complex datasets.


Solution

  • x['a'] != 0 is comparing all values in the series, that's why you are seeing the ValueError.

    To do the comparison element-wise, you can use numpy.where:

    thedf.assign(division=lambda x: np.where(x["a"] != 0, x["b"] / x["a"], 0))
    
        a   b  division
    0  10   9       0.9
    1  20  16       0.8
    2   0  15       0.0
    

    You can also use apply on axis=1 to get the same result using the condition proposed in the question.

    thedf.apply(lambda x: x["b"] / x["a"] if x["a"] != 0 else 0, axis=1)