The following script prints the same input variable input_df
twice at the end - before and after df_lower
has been called:
import pandas as pd
def df_lower(df):
cols = ['col_1']
df[cols] = df[cols].applymap(lambda x: x.lower())
return df
input_df = pd.DataFrame({
'col_1': ['ABC'],
'col_2': ['XYZ']
})
print(input_df)
processed_df = df_lower(input_df)
print(input_df)
The output shows that input_df
changes:
col_1 col_2
0 ABC XYZ
col_1 col_2
0 abc XYZ
Why is input_df
modified?
Why isn't it modified when full input_df
(no column indexing) is processed?
def df_lower_no_indexing(df):
df = df.applymap(lambda x: x.lower())
return df
You are assinging to a slice of the input dataframe.
In the no indexing case, you are just assigning a new value to the local variable df
:
df = df.applymap(lambda x: x.lower())
Which creates a new variable, leaving the input as is.
Conversely, in the first case, you are assigning a value to a slice of the input, hence, modifying the input itself:
df[cols] = df[cols].applymap(lambda x: x.lower())
With a simple change, you can create a new variable as well in the first case:
def df_lower(df):
cols = ['col_1']
df = df[[col for col in df.columns if col not in cols]]
df[cols] = df[cols].applymap(lambda x: x.lower())
return df