Search code examples
numpytime-serieslag

extracting lag features of numpy array (+ expand dimension) | reshape numpy array with stride=1


I have a time series data array of shape (#timestamp,#features). I would like to extract for each row (timestamp) n_lags (previous rows) and reshape the array such that I have the shape (#samples, #lags+now,#features) for an input to a LSTM layer of Keras.

Consider this toy example:

import numpy as np
n_rows = 6
n_feat= 3
n_lag = 2

a = np.array(range(n_rows*n_feat)).reshape(n_rows, n_feat)

>>> a.shape = (6, 3)
>>> a = array([[ 0,  1,  2],
           [ 3,  4,  5],
           [ 6,  7,  8],
           [ 9, 10, 11],
           [12, 13, 14],
           [15, 16, 17]])

With iterating over rows I achieve the expected output:

b = np.empty(shape=(0, (n_lag + 1), n_feat))
for idx, row in enumerate(a):
   temp = np.expand_dims(a[max(0, idx-n_lag):idx+1, :], 0)
   if temp.shape[1:] == b.shape[1:]:
       b = np.append(b, temp, axis=0)


>>> b.shape = (4, 3, 3)
>>> b = array([[[ 0.,  1.,  2.],
            [ 3.,  4.,  5.],
            [ 6.,  7.,  8.]],

           [[ 3.,  4.,  5.],
            [ 6.,  7.,  8.],
            [ 9., 10., 11.]],

           [[ 6.,  7.,  8.],
            [ 9., 10., 11.],
            [12., 13., 14.]],

           [[ 9., 10., 11.],
            [12., 13., 14.],
            [15., 16., 17.]]])

Note: the first n_lags-1 rows do not have enough data and will be discarded in the final output

Question: I would like to know if there is a more elegant / nice way than iterating over the rows.


Solution

  • You can use the new np.lib.stride_tricks.sliding_window_view for this

    n_rows = 6
    n_feat= 3
    n_lag = 2
    
    a = np.array(range(n_rows*n_feat)).reshape(n_rows, n_feat)
    
    b = np.lib.stride_tricks.sliding_window_view(a, window_shape=(n_feat, n_feat))
    b
    

    output:

    array([[[[ 0,  1,  2],
             [ 3,  4,  5],
             [ 6,  7,  8]]],
    
    
           [[[ 3,  4,  5],
             [ 6,  7,  8],
             [ 9, 10, 11]]],
    
    
           [[[ 6,  7,  8],
             [ 9, 10, 11],
             [12, 13, 14]]],
    
    
           [[[ 9, 10, 11],
             [12, 13, 14],
             [15, 16, 17]]]])
    

    b will just change the shape and strides of a, so it will contain the same memory location of a multiple times. In other words, no need to allocate a new array.