Search code examples
pythonpandasdataframechaining

Method chaining and returning value from method


this is a general question about Python class method chaining. I have a class that performs some data wrangling operations and returns a dataframe for the methods. I'm trying to method chain using the methods after creating the dataframe object, much like Pandas does. However, I seem to be running into some issues. Here's a simple example of a couple of methods in the class:

class Data:

    def __init__(self, df):
        self.df = df

    def remove_rows(self, col):
        df = (perform_some_operations)
        return df

    def collapse(self, cols):
        df = (perform_some_operations)
        return df

So I can use this like so:

df = Data(df)
df = df.remove_rows(col_1)
df = df.collapse(col_1)

However, if I want to use it like:

df = df.remove_rows(col_1).collapse(col_1)

I will get errors. Now if I returned self from these methods, I am able to chain them together, but the output is the Data object instead of a dataframe.

In Pandas, you are able to, for example do the following:

df = pd.read_csv('data.csv')
df = df.rename(columns={'col_1':'COL_1'}).drop(columns=['COL_1'])

and also

df = df.rename(columns={'col_1':'COL_1'})
df = df.drop(columns=['COL_1'])

I would like to understand how I can create methods that allow me to both chain operations but also use them separately to get values if I need to. I did some research and it seems like you can do one or the other, but if you take Pandas for example, you are able to do both.


Solution

  • You need to use self.df in methods Like this:

    class Data:
    
        def __init__(self, df):
            self.df = df
    
        def remove_rows(self, col):
            self.df = (perform_some_operations)
            return self.df
    
        def collapse(self, cols):
            self.df = (perform_some_operations)
            return self.df