Search code examples
pythonpandaspandarallel

How Does Python Apply a Method from one Library to the Object of Another?


When using pandarallel to use all cores when running .apply methods on my dataframes, I came across a syntax which I never saw before. Rather, it's a way of using dot syntax that I don't understand.

import pandas as pd
from pandarallel import pandarallel

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b'])


So far so good, just setting up a dataframe. Next, to get pandarallel ready, we do

pandarallel.initialize()


Next up is the bit where I am confused: to use pandarallel we call this method on the dataframe

df.parallel_apply(func)


My question is: if the dataframe df was instantiated using the pandas library, and pandas does not have a method called parallel_apply, how is it that Python knows to use the pandarallel method on the pandas object?

I presume it's something to do with the initialization, but I have never seen this before and I don't understand what's happening in the back end.


Solution

  • You can create your methods to a previously created object:

    def my_func(self):
        return 2*self
    
    
    pd.DataFrame.my_method = my_func
    
    df.my_method()
    
    a   b
    2   8
    4  10
    6  12
    

    You can even pass arguments:

    def sum_x(self, x):
        return self+x
    
    pd.DataFrame.sum_x = sum_x
    
    df.sum_x(3)
    a  b
    4  7
    5  8
    6  9
    
    

    The first argument will be the self as a usual method inside a class.