I'm a newbie to python so please bear with me. I have a data frame where I want to replace values for specific strings. Below is my starting df (df_train):
A B C D
0 .5 Ex Ex Po
1 35 Gd TA Gd
2 52 TA Fa Ex
3 47 Bd Po Gd
I can easily replace the values I'd like and create a new df(df_train_scaled), per below:
df_train_scaled = df_train.replace(['Ex','Gd','TA','Fa','Po'], [5, 4, 3, 2, 1])
I'm curious if I should do this and go to a new df(df_train_scaled) to continue data pre-processing before modeling, or if I should create a new column in the same df (df_train). Regardless of the answer, I do want to figure out how to add a new column to the same df with the replaced values. Output below:
A B B_new C C_new D D_new
0 .5 Ex 5 Ex 5 Po 1
1 35 Gd 4 TA 3 Gd 4
2 52 TA 3 Fa 2 Ex 5
3 47 Gd 4 Po 1 Gd 4
If I do this, I can experiment to see if my ordinal, or scaled, variables will perform better in my modeling efforts. Thanks in advance for any help!
You can simply append two tables together:
df_train_scaled = df_train.iloc[:,1:].replace(['Ex','Gd','TA','Fa','Po'], [5, 4, 3, 2, 1])
df_train_scaled.columns = [x + "_new" for x in df_train_scaled.columns]
pd.concat([df_train, df_train_scaled], axis=1)