I would like to create three new columns in an existing pandas dataframe (df
) by using the same lambda
function on the same column only with different input. I have succeeded that by the following lines of code. Nevertheless, I would like to know whether there is a quicker way to produce the same output with fewer lines of code:
df["1d"] = (
df
.groupby(cols)["ln"]
.apply(lambda x: x - x.shift(1))
.values
)
df["5d"] = (
df
.groupby(cols)["ln"]
.apply(lambda x: x - x.shift(5))
.values
)
df["30d"] = (
df
.groupby(cols)["ln"]
.apply(lambda x: x - x.shift(30))
.values
)
You have to loop over the shifts, but you could reuse the grouper:
shifts = [1, 5, 30]
g = df.groupby(cols)['ln']
for s in shifts:
df[f'{s}d'] = df['ln']-g.shift(s)
Also, your approach to use groupby.apply
+ .values
is incorrect. This would only work if the rows are already sorted by group. If you use groupby.shift
you will maintain a correct index/alignement.
If you really need to use a custom function, go with groupby.transform
:
shifts = [1, 5, 30]
g = df.groupby(cols)['ln']
for s in shifts:
df[f'{s}d'] = g.transform(lambda x: x-x.shift(s))
Example with a dummy input and cols = ['A', 'B']
:
A B ln 1d 5d 30d
0 0 0 0 NaN NaN NaN
1 0 0 2 2.0 NaN NaN
2 0 0 10 8.0 NaN NaN
3 0 1 6 NaN NaN NaN
4 0 1 8 2.0 NaN NaN
5 0 1 9 1.0 NaN NaN
6 0 1 15 6.0 NaN NaN
7 1 0 4 NaN NaN NaN
8 1 0 19 15.0 NaN NaN
9 1 1 1 NaN NaN NaN
10 1 1 3 2.0 NaN NaN
11 1 1 12 9.0 NaN NaN
12 1 1 16 4.0 NaN NaN
13 1 1 17 1.0 NaN NaN
14 1 1 18 1.0 17.0 NaN
15 2 0 7 NaN NaN NaN
16 2 0 11 4.0 NaN NaN
17 2 0 13 2.0 NaN NaN
18 2 1 5 NaN NaN NaN
19 2 1 14 9.0 NaN NaN