Suppose I have a Dataframe df
of many columns now, and I'm trying to generate a large number of variable. The naive way I think is
df['new_var_1'] = df['var_1'] + df['var_2']
df['new_var_2'] = df['var_2'] / (df['var_3'] + df['var_4'])
df['new_var_3'] = ...
...
df['new_var_100'] = ...
However, repeating inputing df[]
is a boring work, and I wonder whether there is a convenient way to generate such variables without repeating input df[]
so many times, just like what I do in R:
df <- transfrom(df,
new_var_1 = var_1 + var_2,
new_var_2 = var_2 / (var_3 + var_4),
new_var_3 = ...
...
new_var_100 = ...
)
Since the name of the DataFrame I used in my project is not so short as df
, it's important to find a better way to generate new variables. Thanks for your answer!
I have searched for long time to find such way, but failed. All answers generate variables in a similar way.
Use pandas's eval
with one expression per line:
df = pd.DataFrame([[1,2,3,4],[5,6,7,8]], columns=['var_1', 'var_2', 'var_3', 'var_4'])
df.eval('''
new_var_1 = var_1 + var_2
new_var_2 = var_2 / (var_3 + var_4)
new_var_3 = 3
new_var_100 = 100
'''
)
Output;
var_1 var_2 var_3 var_4 new_var_1 new_var_2 new_var_3 new_var_100
0 1 2 3 4 3 0.285714 3 100
1 5 6 7 8 11 0.400000 3 100