This is my initial approach:
In [91]: def f(dataframe,col):
...: dataframe[col] = dataframe[col]*0
But this failed with the following:
In [90]: df=pd.DataFrame({'a':[1,2],'b':[4,5]})
In [91]: def f(dataframe,col):
...: dataframe[col] = dataframe[col]*0
...:
In [92]: df.f('a')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-92-e1a104c6b712> in <module>
----> 1 df.f('a')
~/.virtualenvs/this-env/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
5177 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5178 return self[name]
-> 5179 return object.__getattribute__(self, name)
5180
5181 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'f'
I assumed that this would be fairly well documented, but I can't an example anywhere.
What you are trying to do is called monkey-patching. You need to write the function as a method (it will have self
as the first parameter) and then assign the method as an attribute to pd.DataFrame
class, not the instantiated object.
import pandas as pd
def f(self, col):
self.loc[:, col] = self.loc[:, col] * 0
return self
pd.DataFrame.f = f
df=pd.DataFrame({'a':[1,2],'b':[4,5]})
df.f('a')
# returns:
a b
0 0 4
1 0 5
Keep in mind that your method as-written will modify the dataframe in-place. If you need to preserve the original dataframe, use .copy
at the top of your function.