Search code examples
pythonpandaspandas-apply

Pandas: Aggregating column slices as arrays


I have a Pandas dataframe that looks as such:

                      Scaled
               Date 
2020-07-01 02:40:00 0.604511
2020-07-01 02:45:00 0.640577
2020-07-01 02:50:00 0.587683
2020-07-01 02:55:00 0.491515
....

I am trying to add a new column called X which is supposed to look as such, where every two previous values become an array:

                      Scaled   X
               Date 
2020-07-01 02:40:00 0.604511 nan
2020-07-01 02:45:00 0.640577 nan
2020-07-01 02:50:00 0.587683 [0.604511 0.640577]
2020-07-01 02:55:00 0.491515 [0.640577 0.587683]
...

I am trying a for-loop to do so, but I don't think this is the most elegant and efficient way, so any suggestion of how to do this in pandas? (but it is not going as intended)

window_size = 2
for i in range(window_size, df.shape[0]):
    df['X'][i] = df['Scaled'][i - window_size:window_size] 

Solution

  • To use pandas, you may use list comprehension and concat and shift

    window_size = 2
    df['X'] = (pd.concat([df.Scaled.shift(-i) for i in range(window_size)], axis=1)
                 .shift(window_size).values.tolist())
    
    Out[213]:
         Scaled                               X
    0  0.604511                      [nan, nan]
    1  0.640577                      [nan, nan]
    2  0.587683  [0.604511, 0.6405770000000001]
    3  0.491515  [0.6405770000000001, 0.587683]