Search code examples
pythonpandasfunctionmethodschaining

How to write a python function that can be used with pandas method chaining


This is my initial approach:

In [91]: def f(dataframe,col):
    ...:     dataframe[col] = dataframe[col]*0

But this failed with the following:

In [90]: df=pd.DataFrame({'a':[1,2],'b':[4,5]})

In [91]: def f(dataframe,col):
    ...:     dataframe[col] = dataframe[col]*0
    ...:

In [92]: df.f('a')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-92-e1a104c6b712> in <module>
----> 1 df.f('a')

~/.virtualenvs/this-env/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5177             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5178                 return self[name]
-> 5179             return object.__getattribute__(self, name)
   5180
   5181     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'f'

I assumed that this would be fairly well documented, but I can't an example anywhere.


Solution

  • What you are trying to do is called monkey-patching. You need to write the function as a method (it will have self as the first parameter) and then assign the method as an attribute to pd.DataFrame class, not the instantiated object.

    import pandas as pd
    
    def f(self, col):
        self.loc[:, col] = self.loc[:, col] * 0
        return self
    
    pd.DataFrame.f = f
    
    df=pd.DataFrame({'a':[1,2],'b':[4,5]})
    df.f('a')
    # returns:
       a  b
    0  0  4
    1  0  5
    

    Keep in mind that your method as-written will modify the dataframe in-place. If you need to preserve the original dataframe, use .copy at the top of your function.