Search code examples
pythonpandasdataframetranspose

Transpose a rolling set of values in a column into a single value in a cell


I am having troubles with a data transformation I am trying to do. I have a column of data (Ex. 1,2,3,4,5,6,7,8,9)I want to create a new column that looks back n rows and concatenates the values into a new value, preferably an integer. For example, if the lookback window is 3 in my example, the new column would be Nan, Nan, 123, 234,345,456,567,678,789.

So far, here is a bit of code that I tried where n is the lookback window and Streak is the dataframe with the values I am looking to combine into a new value (streakHistory):

def getStreakHistory(Streak, n=20):
    streakHistory= ""
    for x in range(1, n + 1):
         streakHistory=str(streakHistory) + str(Streak["Streak"].shift(x))
    return streakHistory

df["Streak History"] = getStreakHistory(Streak)

This seems to run an error because streakHistory is a string. I have seen other options where you transpose into other cells, but I want all of the values to be combined and entered into 1 cell. Any help would be greatly appreciated. I also looked into a join, but that seems to be similar to a standard table join and not really the same as what I was looking at unless I am overlooking a particular functionality of it.


Solution

  • One option would be to use 's sliding_window_view combined with agg:

    from numpy.lib.stride_tricks import sliding_window_view as svw
    
    df = pd.DataFrame({'Streak': [1,2,3,4,5,6,7,8,9]})
    N = 3
    
    df['Streak History'] = (pd.DataFrame(svw(df['Streak'].astype(str), N),
                                         index=df.index[N-1:])
                              .agg(''.join, axis=1)
                           )
    

    Output:

       Streak Streak History
    0       1            NaN
    1       2            NaN
    2       3            123
    3       4            234
    4       5            345
    5       6            456
    6       7            567
    7       8            678
    8       9            789
    

    Numeric variant:

    df['Streak History'] = pd.Series((svw(df['Streak'], N)
                                      *(10**np.arange(N-1, -1, -1))).sum(1),
                                        index=df.index[N-1:])
    

    Output:

       Streak  Streak History
    0       1             NaN
    1       2             NaN
    2       3           123.0
    3       4           234.0
    4       5           345.0
    5       6           456.0
    6       7           567.0
    7       8           678.0
    8       9           789.0