Search code examples
pandasdataframeapply

Error> The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()


I want to sum the columns in a Pandas dataframe(array) horizontally along each row that only have a value of 1. I am not summing the columns vertically. I am summing them horizontally. It seems like every example on the Internet sums column vertically.

Here is my simple function:

def isOne(x):
    if (x == 1):
        return 1 
    else:
        return 0

Here is my apply statement

df.apply(lambda column: sum(isOne(column)), axis=1)

This is the error that I receive:

ValueError                                Traceback (most recent call last)
Cell In[30], line 2
      1 tally = 0
----> 2 df.apply(lambda column: sum(isOne(column)), axis=1
      3         )

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\frame.py:9568, in DataFrame.apply(self, func, axis, raw, result_type, args, **kwargs)
   9557 from pandas.core.apply import frame_apply
   9559 op = frame_apply(
   9560     self,
   9561     func=func,
   (...)
   9566     kwargs=kwargs,
   9567 )
-> 9568 return op.apply().__finalize__(self, method="apply")

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\apply.py:764, in FrameApply.apply(self)
    761 elif self.raw:
    762     return self.apply_raw()
--> 764 return self.apply_standard()

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\apply.py:891, in FrameApply.apply_standard(self)
    890 def apply_standard(self):
--> 891     results, res_index = self.apply_series_generator()
    893     # wrap results
    894     return self.wrap_results(results, res_index)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\apply.py:907, in FrameApply.apply_series_generator(self)
    904 with option_context("mode.chained_assignment", None):
    905     for i, v in enumerate(series_gen):
    906         # ignore SettingWithCopy here in case the user mutates
--> 907         results[i] = self.f(v)
    908         if isinstance(results[i], ABCSeries):
    909             # If we have a view on v, we need to make a copy because
    910             #  series_generator will swap out the underlying data
    911             results[i] = results[i].copy(deep=False)

Cell In[30], line 2, in <lambda>(column)
      1 tally = 0
----> 2 df.apply(lambda column: sum(isOne(column)), axis=1
      3         )

Cell In[29], line 2, in isOne(x)
      1 def isOne(x):
----> 2     if (x == 1):
      3         return 1 
      4     else:

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\generic.py:1527, in NDFrame.__nonzero__(self)
   1525 @final
   1526 def __nonzero__(self) -> NoReturn:
-> 1527     raise ValueError(
   1528         f"The truth value of a {type(self).__name__} is ambiguous. "
   1529         "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1530     )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I tried appending .any(), .all() and .item to boolean expression, buthtat had no effect.


Solution

  • If you want to count the number of 1's in each row, use vectorized code. It's a lot faster:

    df.eq(1).sum(axis=1)
    

    If you want to use a lambda as practice: with axis=1, x is a Series containing the values for the row. Statements like "series equal to 1?" are ambiguous. Do you mean the series contains any 1, or all 1? Try this:

    def count_ones(x: pd.Series) -> int:
        return (x == 1).sum()
    
    df.apply(count_ones, axis=1)