I am experiencing issues passing numpy.float64
variables as arguments to pandas.Series.apply()
. Is there any way to forcefully use pandas version of the .mean()
and .std()
functions to hopefully satisfy Pandas?
The Code
def normalization(val_to_norm, col_mean, col_sd):
return (val_to_norm - col_mean) / col_sd
voting_df['pop_estimate'].info()
pop_mean, pop_sd = voting_df['pop_estimate'].mean(), voting_df['pop_estimate'].std()
voting_df['pop_estimate'] = voting_df['pop_estimate'].apply(normalization, pop_mean, pop_sd)
output
The key line is at the bottom.
<class 'pandas.core.series.Series'>
Int64Index: 3145 entries, 0 to 3144
Series name: pop_estimate
Non-Null Count Dtype
-------------- -----
3145 non-null float64
dtypes: float64(1)
memory usage: 49.1 KB
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [46], line 7
4 voting_df['pop_estimate'].info()
6 pop_mean, pop_sd = voting_df['pop_estimate'].mean(), voting_df['pop_estimate'].std()
----> 7 voting_df['pop_estimate'] = voting_df['pop_estimate'].apply(normalization, pop_mean, pop_sd)
File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py:4774, in Series.apply(self, func, convert_dtype, args, **kwargs)
4664 def apply(
4665 self,
4666 func: AggFuncType,
(...)
4669 **kwargs,
4670 ) -> DataFrame | Series:
4671 """
4672 Invoke function on values of Series.
4673
(...)
4772 dtype: float64
4773 """
-> 4774 return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\apply.py:1100, in SeriesApply.apply(self)
1097 return self.apply_str()
1099 # self.f is Callable
-> 1100 return self.apply_standard()
File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\apply.py:1151, in SeriesApply.apply_standard(self)
1149 else:
1150 values = obj.astype(object)._values
-> 1151 mapped = lib.map_infer(
1152 values,
1153 f,
1154 convert=self.convert_dtype,
1155 )
1157 if len(mapped) and isinstance(mapped[0], ABCSeries):
1158 # GH#43986 Need to do list(mapped) in order to get treated as nested
1159 # See also GH#25959 regarding EA support
1160 return obj._constructor_expanddim(list(mapped), index=obj.index)
File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\_libs\lib.pyx:2919, in pandas._libs.lib.map_infer()
File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\apply.py:139, in Apply.__init__.<locals>.f(x)
138 def f(x):
--> 139 return func(x, *args, **kwargs)
TypeError: Value after * must be an iterable, not numpy.float64
To provide additional arguments to a function called with pd.Series.apply
, you need to pass them as keyword arguments, or using a tuple keyword argument args
.
From the docs:
Series.apply
(func, convert_dtype=True, args=(), **kwargs)
Invoke function on values of Series.
Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.
Parameters
func: function
Python function or NumPy ufunc to apply.convert_dtype: bool, default True
Try to find better dtype for elementwise function results. If False, leave as dtype=object. Note that the dtype is always preserved for some extension array dtypes, such as Categorical.args: tuple
Positional arguments passed to func after the series value.**kwargs
Additional keyword arguments passed to func.
So to call this with positional arguments:
voting_df['pop_estimate'].apply(normalization, args=(pop_mean, pop_sd))
Alternatively, with keyword arguments:
voting_df['pop_estimate'].apply(normalization, col_mean=pop_mean, col_sd=pop_sd)