When using pandarallel
to use all cores when running .apply methods on my dataframes, I came across a syntax which I never saw before. Rather, it's a way of using dot syntax that I don't understand.
import pandas as pd
from pandarallel import pandarallel
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b'])
So far so good, just setting up a dataframe. Next, to get pandarallel
ready, we do
pandarallel.initialize()
Next up is the bit where I am confused: to use pandarallel we call this method on the dataframe
df.parallel_apply(func)
My question is: if the dataframe df
was instantiated using the pandas
library, and pandas
does not have a method called parallel_apply
, how is it that Python knows to use the pandarallel
method on the pandas
object?
I presume it's something to do with the initialization, but I have never seen this before and I don't understand what's happening in the back end.
You can create your methods to a previously created object:
def my_func(self):
return 2*self
pd.DataFrame.my_method = my_func
df.my_method()
a b
2 8
4 10
6 12
You can even pass arguments:
def sum_x(self, x):
return self+x
pd.DataFrame.sum_x = sum_x
df.sum_x(3)
a b
4 7
5 8
6 9
The first argument will be the self
as a usual method inside a class.