this is a general question about Python class method chaining. I have a class that performs some data wrangling operations and returns a dataframe for the methods. I'm trying to method chain using the methods after creating the dataframe object, much like Pandas does. However, I seem to be running into some issues. Here's a simple example of a couple of methods in the class:
class Data:
def __init__(self, df):
self.df = df
def remove_rows(self, col):
df = (perform_some_operations)
return df
def collapse(self, cols):
df = (perform_some_operations)
return df
So I can use this like so:
df = Data(df)
df = df.remove_rows(col_1)
df = df.collapse(col_1)
However, if I want to use it like:
df = df.remove_rows(col_1).collapse(col_1)
I will get errors. Now if I returned self
from these methods, I am able to chain them together, but the output is the Data
object instead of a dataframe.
In Pandas
, you are able to, for example do the following:
df = pd.read_csv('data.csv')
df = df.rename(columns={'col_1':'COL_1'}).drop(columns=['COL_1'])
and also
df = df.rename(columns={'col_1':'COL_1'})
df = df.drop(columns=['COL_1'])
I would like to understand how I can create methods that allow me to both chain operations but also use them separately to get values if I need to. I did some research and it seems like you can do one or the other, but if you take Pandas
for example, you are able to do both.
You need to use self.df in methods Like this:
class Data:
def __init__(self, df):
self.df = df
def remove_rows(self, col):
self.df = (perform_some_operations)
return self.df
def collapse(self, cols):
self.df = (perform_some_operations)
return self.df