Search code examples
pythonpandasdataframepython-applymap

Modifying an indexed DataFrame inside a function changes the original variable


The following script prints the same input variable input_df twice at the end - before and after df_lower has been called:

import pandas as pd

def df_lower(df):
    cols = ['col_1']
    df[cols] = df[cols].applymap(lambda x: x.lower())
    return df

input_df = pd.DataFrame({
    'col_1': ['ABC'],
    'col_2': ['XYZ']
})

print(input_df)
processed_df = df_lower(input_df)
print(input_df)

The output shows that input_df changes:

  col_1 col_2
0   ABC   XYZ
  col_1 col_2
0   abc   XYZ

Why is input_df modified?

Why isn't it modified when full input_df (no column indexing) is processed?

def df_lower_no_indexing(df):
    df = df.applymap(lambda x: x.lower())
    return df

Solution

  • You are assinging to a slice of the input dataframe. In the no indexing case, you are just assigning a new value to the local variable df:

    df = df.applymap(lambda x: x.lower())
    

    Which creates a new variable, leaving the input as is.

    Conversely, in the first case, you are assigning a value to a slice of the input, hence, modifying the input itself:

    df[cols] = df[cols].applymap(lambda x: x.lower())
    

    With a simple change, you can create a new variable as well in the first case:

    def df_lower(df):
        cols = ['col_1']
        df = df[[col for col in df.columns if col not in cols]]
        df[cols] = df[cols].applymap(lambda x: x.lower())
        return df