I am having troubles with a data transformation I am trying to do. I have a column of data (Ex. 1,2,3,4,5,6,7,8,9)I want to create a new column that looks back n rows and concatenates the values into a new value, preferably an integer. For example, if the lookback window is 3 in my example, the new column would be Nan, Nan, 123, 234,345,456,567,678,789.
So far, here is a bit of code that I tried where n is the lookback window and Streak is the dataframe with the values I am looking to combine into a new value (streakHistory):
def getStreakHistory(Streak, n=20):
streakHistory= ""
for x in range(1, n + 1):
streakHistory=str(streakHistory) + str(Streak["Streak"].shift(x))
return streakHistory
df["Streak History"] = getStreakHistory(Streak)
This seems to run an error because streakHistory is a string. I have seen other options where you transpose into other cells, but I want all of the values to be combined and entered into 1 cell. Any help would be greatly appreciated. I also looked into a join, but that seems to be similar to a standard table join and not really the same as what I was looking at unless I am overlooking a particular functionality of it.
One option would be to use numpy's sliding_window_view
combined with agg
:
from numpy.lib.stride_tricks import sliding_window_view as svw
df = pd.DataFrame({'Streak': [1,2,3,4,5,6,7,8,9]})
N = 3
df['Streak History'] = (pd.DataFrame(svw(df['Streak'].astype(str), N),
index=df.index[N-1:])
.agg(''.join, axis=1)
)
Output:
Streak Streak History
0 1 NaN
1 2 NaN
2 3 123
3 4 234
4 5 345
5 6 456
6 7 567
7 8 678
8 9 789
Numeric variant:
df['Streak History'] = pd.Series((svw(df['Streak'], N)
*(10**np.arange(N-1, -1, -1))).sum(1),
index=df.index[N-1:])
Output:
Streak Streak History
0 1 NaN
1 2 NaN
2 3 123.0
3 4 234.0
4 5 345.0
5 6 456.0
6 7 567.0
7 8 678.0
8 9 789.0