Search code examples
pythonnumpynumpy-ndarraynumpy-slicing

Unexpected results from Numpy r_


When I use ":n" or "m:" as arguments to np.r_, I get unexpected results that I don't understand.

Here's my code

import numpy as np  
B = np.arange(180).reshape(6,30)
C = B[:, np.r_[10:15, 20:26]]
D = C[:, np.r_[0:3,8:11]]

Now all of that worked as expected. C prints as:

array([[ 10,  11,  12,  13,  14,  20,  21,  22,  23,  24,  25],
       [ 40,  41,  42,  43,  44,  50,  51,  52,  53,  54,  55],
       [ 70,  71,  72,  73,  74,  80,  81,  82,  83,  84,  85],
       [100, 101, 102, 103, 104, 110, 111, 112, 113, 114, 115],
       [130, 131, 132, 133, 134, 140, 141, 142, 143, 144, 145],
       [160, 161, 162, 163, 164, 170, 171, 172, 173, 174, 175]])

and D is:

array([[ 10,  11,  12,  23,  24,  25],
       [ 40,  41,  42,  53,  54,  55],
       [ 70,  71,  72,  83,  84,  85],
       [100, 101, 102, 113, 114, 115],
       [130, 131, 132, 143, 144, 145],
       [160, 161, 162, 173, 174, 175]])

However, when I remove the "0" and the "11," I don't understand what happens, and I haven't been able to find any explanation in any Numpy indexing or r_ documentation. Here's the new line of code:

E = C[:, np.r_[:3, 8:]]

It's just the same expression that defined the D array with "unnecessary" indices removed. However, the results are mystifying:

array([[ 10,  11,  12,  10,  11,  12,  13,  14,  20,  21,  22],
       [ 40,  41,  42,  40,  41,  42,  43,  44,  50,  51,  52],
       [ 70,  71,  72,  70,  71,  72,  73,  74,  80,  81,  82],
       [100, 101, 102, 100, 101, 102, 103, 104, 110, 111, 112],
       [130, 131, 132, 130, 131, 132, 133, 134, 140, 141, 142],
       [160, 161, 162, 160, 161, 162, 163, 164, 170, 171, 172]])

I expected E to be identical to D, with just six columns. What's going on? Is this behavior documented somewhere, or is this a bug?


Solution

  • To understand the difference between D and E we have to look what the np.r_ produces. As with function calls, the 'contents' of an indexing, if complex, are evaluated first.

    In [112]: D = C[:, np.r_[0:3,8:11]]; D.shape
    Out[112]: (6, 6)
    In [113]: E = C[:, np.r_[:3,8:]]; E.shape
    Out[113]: (6, 11)
    

    The two r_:

    In [115]: np.r_[0:3,8:11]
    Out[115]: array([ 0,  1,  2,  8,  9, 10])    
    In [116]: np.r_[:3,8:]
    Out[116]: array([0, 1, 2, 0, 1, 2, 3, 4, 5, 6, 7])
    

    r_ is an instance of a class defined in np.lib.index_tricks. That class has its own __getitem__ method, allowing us to use indexing notation, but the task is actually a call to np.concatenate.

    We can see what r_ get by using another index_tricks:

    In [117]: np.s_[0:3, 8:11]
    Out[117]: (slice(0, 3, None), slice(8, 11, None))    
    In [118]: np.s_[:3, 8:]
    Out[118]: (slice(None, 3, None), slice(8, None, None))
    

    If we define a simple function:

    def foo(aslice):
        return np.arange(aslice.start, aslice.stop, aslice.step)
    

    we can test the different slices:

    In [124]: foo(np.s_[8:11])            # np.arange(8,11)
    Out[124]: array([ 8,  9, 10])
    
    In [125]: foo(np.s_[8:])              # np.arange(8)
    Out[125]: array([0, 1, 2, 3, 4, 5, 6, 7])
    

    Remember, that when we give arange just one number, it's understood to be the 'stop', with an implicit 0 start. That's the same as with python's base range.

    np.r_ actually uses:

    In [105]: def foo1(item):
         ...:     step = item.step
         ...:     start = item.start
         ...:     stop = item.stop
         ...:     if start is None:
         ...:         start = 0
         ...:     if step is None:
         ...:         step = 1
         ...:     return np.arange(start, stop, step)
    

    but this just lets us use np.r_[:3] instead of np.r_[0:3]. It doesn't change the [8:] case.

    In case it isn't clear. A[i,j] is translated by the interpreter into A.__getitem__((i,j)), a function call. The interpreter also converts any '::' into a slice(...) object, as illustrated by s_.

    After converting the slices into arrays with np.arange or np.linspace (for 'complex' steps), it does a concatenate

    So your two r_ expressions are really:

    In [128]: np.concatenate([np.arange(0,3), np.arange(8,11)])    # [115]
    Out[128]: array([ 0,  1,  2,  8,  9, 10])
    
    In [129]: np.concatenate([np.arange(0,3), np.arange(8,None)])   # [116]
    Out[129]: array([0, 1, 2, 0, 1, 2, 3, 4, 5, 6, 7])
    

    I suppose one could argue that np.r_[8:] should raise an error, since it provides a start without stop, and thus can't be evaluated as it would in a real indexing case. As coded it works because of the default behavior of np.arange.

    edit

    When I use '8:' directly, C can deduce the correct stop from its own shape:

    In [140]: C.shape
    Out[140]: (6, 11)
    
    In [141]: C[:,8:].shape
    Out[141]: (6, 3)
    

    But an np.r_ object does not have a shape, nor can it deduce the shape from C:

    In [142]: np.r_.shape
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    Input In [142], in <cell line: 1>()
    ----> 1 np.r_.shape
    
    AttributeError: 'RClass' object has no attribute 'shape'
    

    If you want to avoid the explicit 11, you have use:

    In [143]: C[:, np.r_[8:C.shape[1]]].shape
    Out[143]: (6, 3)