Search code examples
pythonpandasreplacedata-processing

Pandas - create new column with replaced values while keeping original column


I'm a newbie to python so please bear with me. I have a data frame where I want to replace values for specific strings. Below is my starting df (df_train):

       A    B     C     D
0     .5   Ex    Ex    Po
1     35   Gd    TA    Gd
2     52   TA    Fa    Ex
3     47   Bd    Po    Gd

I can easily replace the values I'd like and create a new df(df_train_scaled), per below:

df_train_scaled = df_train.replace(['Ex','Gd','TA','Fa','Po'], [5, 4, 3, 2, 1])

I'm curious if I should do this and go to a new df(df_train_scaled) to continue data pre-processing before modeling, or if I should create a new column in the same df (df_train). Regardless of the answer, I do want to figure out how to add a new column to the same df with the replaced values. Output below:

       A    B   B_new  C   C_new   D   D_new
0     .5   Ex     5    Ex     5    Po    1
1     35   Gd     4    TA     3    Gd    4
2     52   TA     3    Fa     2    Ex    5
3     47   Gd     4    Po     1    Gd    4

If I do this, I can experiment to see if my ordinal, or scaled, variables will perform better in my modeling efforts. Thanks in advance for any help!


Solution

  • You can simply append two tables together:

    df_train_scaled = df_train.iloc[:,1:].replace(['Ex','Gd','TA','Fa','Po'], [5, 4, 3, 2, 1])
    df_train_scaled.columns = [x + "_new" for x in df_train_scaled.columns]
    pd.concat([df_train, df_train_scaled], axis=1)