Search code examples
pythonarrayspython-3.xnumpystride

Can numpy strides stride only within subarrays?


I have a really big numpy array(145000 rows * 550 cols). And I wanted to create rolling slices within subarrays. I tried to implement it with a function. The function lagged_vals behaves as expected but np.lib.stride_tricks does not behave the way I want it to -

def lagged_vals(series,l):
# Garbage implementation but still right
    return np.concatenate([[x[i:i+l] for i in range(x.shape[0]) if i+l <= x.shape[0]] for x in series]
                          ,axis = 0)

# Sample 2D numpy array
something = np.array([[1,2,2,3],[2,2,3,3]])
lagged_vals(something,2) # Works as expected

# array([[1, 2],
#     [2, 2],
#     [2, 3],
#     [2, 2],
#     [2, 3],
#     [3, 3]])


np.lib.stride_tricks.as_strided(something,
                               (something.shape[0]*something.shape[1],2),
                               (8,8))

# array([[1, 2],
#        [2, 2],
#        [2, 3],
#        [3, 2], <--- across subarray stride, which I do not want
#        [2, 2],
#        [2, 3],
#        [3, 3])

How do I remove that particular row in the np.lib.stride_tricks implementation? And how can I scale this cross array stride removal for a big numpy array ?


Solution

  • Sure, that's possible with np.lib.stride_tricks.as_strided. Here's one way -

    from numpy.lib.stride_tricks import as_strided
    
    L = 2 # window length
    shp = a.shape
    strd = a.strides
    
    out_shp = shp[0],shp[1]-L+1,L
    out_strd = strd + (strd[1],)
    
    out = as_strided(a, out_shp, out_strd).reshape(-1,L)
    

    Sample input, output -

    In [177]: a
    Out[177]: 
    array([[0, 1, 2, 3],
           [4, 5, 6, 7]])
    
    In [178]: out
    Out[178]: 
    array([[0, 1],
           [1, 2],
           [2, 3],
           [4, 5],
           [5, 6],
           [6, 7]])
    

    Note that the last step of reshaping forces it to make a copy there. But that's can't be avoided if we need the final output to be a 2D. If we are okay with a 3D output, skip that reshape and thus achieve a view, as shown with the sample case -

    In [181]: np.shares_memory(a, out)
    Out[181]: False
    
    In [182]: as_strided(a, out_shp, out_strd)
    Out[182]: 
    array([[[0, 1],
            [1, 2],
            [2, 3]],
    
           [[4, 5],
            [5, 6],
            [6, 7]]])
    
    In [183]: np.shares_memory(a, as_strided(a, out_shp, out_strd) )
    Out[183]: True