Search code examples
pandasdata-analysis

How to generate many new columns in pandas Dataframe in a convenient way?


Suppose I have a Dataframe df of many columns now, and I'm trying to generate a large number of variable. The naive way I think is

df['new_var_1'] = df['var_1'] + df['var_2']
df['new_var_2'] = df['var_2'] / (df['var_3'] + df['var_4'])
df['new_var_3'] = ...
...
df['new_var_100'] = ...

However, repeating inputing df[] is a boring work, and I wonder whether there is a convenient way to generate such variables without repeating input df[] so many times, just like what I do in R:

df <- transfrom(df,
    new_var_1 = var_1 + var_2,
    new_var_2 = var_2 / (var_3 + var_4),
    new_var_3 = ...
    ...
    new_var_100 = ...
)

Since the name of the DataFrame I used in my project is not so short as df, it's important to find a better way to generate new variables. Thanks for your answer!

I have searched for long time to find such way, but failed. All answers generate variables in a similar way.


Solution

  • Use pandas's eval with one expression per line:

    df = pd.DataFrame([[1,2,3,4],[5,6,7,8]], columns=['var_1', 'var_2', 'var_3', 'var_4'])
    
    df.eval('''
        new_var_1 = var_1 + var_2
        new_var_2 = var_2 / (var_3 + var_4)
        new_var_3 = 3
        new_var_100 = 100
        '''
    )
    

    Output;

       var_1  var_2  var_3  var_4  new_var_1  new_var_2  new_var_3  new_var_100
    0      1      2      3      4          3   0.285714          3          100
    1      5      6      7      8         11   0.400000          3          100