These two lines of code differ only in the parameter values that have been passed. What is unclear to me is why in the first case ("count") we need the quotation marks while in the second case (len) they weren't needed.
by_weekday1 = users.pivot_table(index='weekday', aggfunc='count')
by_weekday2 = users.pivot_table(index='weekday', aggfunc=len)
Thanks in advance!
You can specify only Numpy or Pandas methods (in other words functions that Pandas considers as built-in [for Pandas]) as strings (in quotation marks), otherwise it's a function (it can be a numpy function as well):
users.pivot_table(index='weekday', aggfunc='sum')
is similar to:
users.pivot_table(index='weekday', aggfunc=np.sum)
UPDATE:
here is an excerpt from the source files:
def _python_agg_general(self, func, *args, **kwargs):
func = self._is_builtin_func(func)
...
where _is_builtin_func()
defined as follows:
def _is_builtin_func(self, arg):
"""
if we define an builtin function for this argument, return it,
otherwise return the arg
"""
return SelectionMixin._builtin_table.get(arg, arg)