I want to sum the columns in a Pandas dataframe(array) horizontally along each row that only have a value of 1. I am not summing the columns vertically. I am summing them horizontally. It seems like every example on the Internet sums column vertically.
Here is my simple function:
def isOne(x):
if (x == 1):
return 1
else:
return 0
Here is my apply statement
df.apply(lambda column: sum(isOne(column)), axis=1)
This is the error that I receive:
ValueError Traceback (most recent call last)
Cell In[30], line 2
1 tally = 0
----> 2 df.apply(lambda column: sum(isOne(column)), axis=1
3 )
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\frame.py:9568, in DataFrame.apply(self, func, axis, raw, result_type, args, **kwargs)
9557 from pandas.core.apply import frame_apply
9559 op = frame_apply(
9560 self,
9561 func=func,
(...)
9566 kwargs=kwargs,
9567 )
-> 9568 return op.apply().__finalize__(self, method="apply")
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\apply.py:764, in FrameApply.apply(self)
761 elif self.raw:
762 return self.apply_raw()
--> 764 return self.apply_standard()
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\apply.py:891, in FrameApply.apply_standard(self)
890 def apply_standard(self):
--> 891 results, res_index = self.apply_series_generator()
893 # wrap results
894 return self.wrap_results(results, res_index)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\apply.py:907, in FrameApply.apply_series_generator(self)
904 with option_context("mode.chained_assignment", None):
905 for i, v in enumerate(series_gen):
906 # ignore SettingWithCopy here in case the user mutates
--> 907 results[i] = self.f(v)
908 if isinstance(results[i], ABCSeries):
909 # If we have a view on v, we need to make a copy because
910 # series_generator will swap out the underlying data
911 results[i] = results[i].copy(deep=False)
Cell In[30], line 2, in <lambda>(column)
1 tally = 0
----> 2 df.apply(lambda column: sum(isOne(column)), axis=1
3 )
Cell In[29], line 2, in isOne(x)
1 def isOne(x):
----> 2 if (x == 1):
3 return 1
4 else:
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\generic.py:1527, in NDFrame.__nonzero__(self)
1525 @final
1526 def __nonzero__(self) -> NoReturn:
-> 1527 raise ValueError(
1528 f"The truth value of a {type(self).__name__} is ambiguous. "
1529 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
1530 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I tried appending .any(), .all() and .item to boolean expression, buthtat had no effect.
If you want to count the number of 1's in each row, use vectorized code. It's a lot faster:
df.eq(1).sum(axis=1)
If you want to use a lambda as practice: with axis=1
, x
is a Series containing the values for the row. Statements like "series equal to 1?" are ambiguous. Do you mean the series contains any 1, or all 1? Try this:
def count_ones(x: pd.Series) -> int:
return (x == 1).sum()
df.apply(count_ones, axis=1)