Search code examples
pythonpandasnumpyapplyseries

python pandas apply not accepting numpy.float64 args


I am experiencing issues passing numpy.float64 variables as arguments to pandas.Series.apply(). Is there any way to forcefully use pandas version of the .mean() and .std() functions to hopefully satisfy Pandas?

The Code

def normalization(val_to_norm, col_mean, col_sd):
    return (val_to_norm - col_mean) / col_sd

voting_df['pop_estimate'].info()

pop_mean, pop_sd = voting_df['pop_estimate'].mean(), voting_df['pop_estimate'].std()

voting_df['pop_estimate'] = voting_df['pop_estimate'].apply(normalization, pop_mean, pop_sd)

output

The key line is at the bottom.

<class 'pandas.core.series.Series'>
Int64Index: 3145 entries, 0 to 3144
Series name: pop_estimate
Non-Null Count  Dtype  
--------------  -----  
3145 non-null   float64
dtypes: float64(1)
memory usage: 49.1 KB

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [46], line 7
      4 voting_df['pop_estimate'].info()
      6 pop_mean, pop_sd = voting_df['pop_estimate'].mean(), voting_df['pop_estimate'].std()
----> 7 voting_df['pop_estimate'] = voting_df['pop_estimate'].apply(normalization, pop_mean, pop_sd)

File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py:4774, in Series.apply(self, func, convert_dtype, args, **kwargs)
   4664 def apply(
   4665     self,
   4666     func: AggFuncType,
   (...)
   4669     **kwargs,
   4670 ) -> DataFrame | Series:
   4671     """
   4672     Invoke function on values of Series.
   4673 
   (...)
   4772     dtype: float64
   4773     """
-> 4774     return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\apply.py:1100, in SeriesApply.apply(self)
   1097     return self.apply_str()
   1099 # self.f is Callable
-> 1100 return self.apply_standard()

File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\apply.py:1151, in SeriesApply.apply_standard(self)
   1149     else:
   1150         values = obj.astype(object)._values
-> 1151         mapped = lib.map_infer(
   1152             values,
   1153             f,
   1154             convert=self.convert_dtype,
   1155         )
   1157 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1158     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1159     #  See also GH#25959 regarding EA support
   1160     return obj._constructor_expanddim(list(mapped), index=obj.index)

File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\_libs\lib.pyx:2919, in pandas._libs.lib.map_infer()

File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\apply.py:139, in Apply.__init__.<locals>.f(x)
    138 def f(x):
--> 139     return func(x, *args, **kwargs)

TypeError: Value after * must be an iterable, not numpy.float64

Solution

  • To provide additional arguments to a function called with pd.Series.apply, you need to pass them as keyword arguments, or using a tuple keyword argument args.

    From the docs:

    Series.apply(func, convert_dtype=True, args=(), **kwargs)

    Invoke function on values of Series.

    Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.

    Parameters

    func: function
    Python function or NumPy ufunc to apply.

    convert_dtype: bool, default True
    Try to find better dtype for elementwise function results. If False, leave as dtype=object. Note that the dtype is always preserved for some extension array dtypes, such as Categorical.

    args: tuple
    Positional arguments passed to func after the series value.

    **kwargs
    Additional keyword arguments passed to func.

    So to call this with positional arguments:

    voting_df['pop_estimate'].apply(normalization, args=(pop_mean, pop_sd))
    

    Alternatively, with keyword arguments:

    voting_df['pop_estimate'].apply(normalization, col_mean=pop_mean, col_sd=pop_sd)