When I use ":n" or "m:" as arguments to np.r_, I get unexpected results that I don't understand.
Here's my code
import numpy as np
B = np.arange(180).reshape(6,30)
C = B[:, np.r_[10:15, 20:26]]
D = C[:, np.r_[0:3,8:11]]
Now all of that worked as expected. C prints as:
array([[ 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 25],
[ 40, 41, 42, 43, 44, 50, 51, 52, 53, 54, 55],
[ 70, 71, 72, 73, 74, 80, 81, 82, 83, 84, 85],
[100, 101, 102, 103, 104, 110, 111, 112, 113, 114, 115],
[130, 131, 132, 133, 134, 140, 141, 142, 143, 144, 145],
[160, 161, 162, 163, 164, 170, 171, 172, 173, 174, 175]])
and D is:
array([[ 10, 11, 12, 23, 24, 25],
[ 40, 41, 42, 53, 54, 55],
[ 70, 71, 72, 83, 84, 85],
[100, 101, 102, 113, 114, 115],
[130, 131, 132, 143, 144, 145],
[160, 161, 162, 173, 174, 175]])
However, when I remove the "0" and the "11," I don't understand what happens, and I haven't been able to find any explanation in any Numpy indexing or r_ documentation. Here's the new line of code:
E = C[:, np.r_[:3, 8:]]
It's just the same expression that defined the D array with "unnecessary" indices removed. However, the results are mystifying:
array([[ 10, 11, 12, 10, 11, 12, 13, 14, 20, 21, 22],
[ 40, 41, 42, 40, 41, 42, 43, 44, 50, 51, 52],
[ 70, 71, 72, 70, 71, 72, 73, 74, 80, 81, 82],
[100, 101, 102, 100, 101, 102, 103, 104, 110, 111, 112],
[130, 131, 132, 130, 131, 132, 133, 134, 140, 141, 142],
[160, 161, 162, 160, 161, 162, 163, 164, 170, 171, 172]])
I expected E to be identical to D, with just six columns. What's going on? Is this behavior documented somewhere, or is this a bug?
To understand the difference between D
and E
we have to look what the np.r_
produces. As with function calls, the 'contents' of an indexing, if complex, are evaluated first.
In [112]: D = C[:, np.r_[0:3,8:11]]; D.shape
Out[112]: (6, 6)
In [113]: E = C[:, np.r_[:3,8:]]; E.shape
Out[113]: (6, 11)
The two r_
:
In [115]: np.r_[0:3,8:11]
Out[115]: array([ 0, 1, 2, 8, 9, 10])
In [116]: np.r_[:3,8:]
Out[116]: array([0, 1, 2, 0, 1, 2, 3, 4, 5, 6, 7])
r_
is an instance of a class defined in np.lib.index_tricks
. That class has its own __getitem__
method, allowing us to use indexing notation, but the task is actually a call to np.concatenate
.
We can see what r_
get by using another index_tricks
:
In [117]: np.s_[0:3, 8:11]
Out[117]: (slice(0, 3, None), slice(8, 11, None))
In [118]: np.s_[:3, 8:]
Out[118]: (slice(None, 3, None), slice(8, None, None))
If we define a simple function:
def foo(aslice):
return np.arange(aslice.start, aslice.stop, aslice.step)
we can test the different slices:
In [124]: foo(np.s_[8:11]) # np.arange(8,11)
Out[124]: array([ 8, 9, 10])
In [125]: foo(np.s_[8:]) # np.arange(8)
Out[125]: array([0, 1, 2, 3, 4, 5, 6, 7])
Remember, that when we give arange
just one number, it's understood to be the 'stop', with an implicit 0 start. That's the same as with python's base range
.
np.r_
actually uses:
In [105]: def foo1(item):
...: step = item.step
...: start = item.start
...: stop = item.stop
...: if start is None:
...: start = 0
...: if step is None:
...: step = 1
...: return np.arange(start, stop, step)
but this just lets us use np.r_[:3]
instead of np.r_[0:3]
. It doesn't change the [8:]
case.
In case it isn't clear. A[i,j]
is translated by the interpreter into A.__getitem__((i,j))
, a function call. The interpreter also converts any '::' into a slice(...)
object, as illustrated by s_
.
After converting the slices into arrays with np.arange
or np.linspace
(for 'complex' steps), it does a concatenate
So your two r_
expressions are really:
In [128]: np.concatenate([np.arange(0,3), np.arange(8,11)]) # [115]
Out[128]: array([ 0, 1, 2, 8, 9, 10])
In [129]: np.concatenate([np.arange(0,3), np.arange(8,None)]) # [116]
Out[129]: array([0, 1, 2, 0, 1, 2, 3, 4, 5, 6, 7])
I suppose one could argue that np.r_[8:]
should raise an error, since it provides a start
without stop
, and thus can't be evaluated as it would in a real indexing case. As coded it works because of the default behavior of np.arange
.
When I use '8:' directly, C
can deduce the correct stop
from its own shape
:
In [140]: C.shape
Out[140]: (6, 11)
In [141]: C[:,8:].shape
Out[141]: (6, 3)
But an np.r_
object does not have a shape
, nor can it deduce the shape from C
:
In [142]: np.r_.shape
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [142], in <cell line: 1>()
----> 1 np.r_.shape
AttributeError: 'RClass' object has no attribute 'shape'
If you want to avoid the explicit 11
, you have use:
In [143]: C[:, np.r_[8:C.shape[1]]].shape
Out[143]: (6, 3)