Search code examples
pythonpandasmethod-chaining

copy a dataframe to new variable with method chaining


Is it possible to copy a dataframe in the middle of a method chain to a new variable? Something like:

import pandas as pd

df = (pd.DataFrame([[2, 4, 6],
                    [8, 10, 12],
                    [14, 16, 18],
                    ])
      .assign(something_else=100)
      .div(2)
      .copy_to_new_variable(df_imag)  # Imaginated method to copy df to df_imag.
      .div(10)
      )

print(df_imag) would then return:

    0   1   2   something_else
0   1.0 2.0 3.0 50.0
1   4.0 5.0 6.0 50.0
2   7.0 8.0 9.0 50.0

.copy_to_new_variable(df_imag) could be replaced by df_imag = df.copy() but this would result in compromising the method chain.


Solution

  • Actually, this is what I was looking for. Check the link, the idea is from Matt Harrison (who wrote multiple books about pandas) for debugging of method chains.

    import pandas as pd
    
    def to_df(df, name):
        globals()[name] = df.copy()
        return df
    
    df = (pd.DataFrame([[1, 2, 3],
                        [10, 10, 10],
                        ], columns=["A", "B", "C"]
                       )
          .set_index("C")
          .pipe(to_df, "df_imag")
          .sum()
          )
    

    df_imag is then the intermediate dataframe as described in the question.