Search code examples
pythonarraysnumpyindexingrecord

Indexing and slicing structured ndarrays


Now I'm trying to understand possible ways to index numpy structured arrays, and I kinda get stuck with it. Just a couple of simple examples:

import numpy as np
arr = np.array(zip(range(5), range(5, 10)), dtype=[('a', int), ('b', int)])

arr[0] # first row (record)
arr[(0,)] # the same, as expected

arr['a'] # field 'a' of each record
arr[('a',)] # "IndexError: unsupported iterator index" ?!

arr[1:3] # second and third rows (records)
arr[1:3, 'a'] # "ValueError: invalid literal for long() with base 10: 'a'" ?!
arr['a', 1:3] # same error
arr[..., 'a'] # here too...
arr['a', ...] # and here

So, two subquestions arise:

  • Why is the result for a plain value ('a' in this case) different from the corresponding singleton tuple (('a',))?
  • Why the last four lines raise the error? And, probably more important, how to get the slice arr['a'][1:3] with a single slice? As you can see, obvious arr['a', 1:3] doesn't work.

I also observed the indexing behavior for built-in list and non-structured ndarray, but couldn't find such issues there: putting a single value in a tuple doesn't change anything, and of course indexing like arr[1, 1:3] for plain ndarray works as expected. Given that, should the errors in my example be considered as bugs in numpy?


Solution

  • First, fields are not the same thing as dimensions - although your array arr has two fields and five rows, numpy actually treats it as one-dimensional (it has shape (5,)). Second, tuples have a special status when used as indices into numpy arrays. When you put a tuple inside the square indexing brackets, numpy interprets it as a sequence of indices into the corresponding dimensions of the array. In the special case where you have nested tuples, each inner tuple is treated as a sequence of indices into that dimension (as if it were a list).

    Since fields don't count as dimensions, when you index it with arr[('a',)], numpy interprets 'a' as an index into the rows of arr. The IndexError is therefore raised because strings aren't a valid type for indexing into a dimension of an array (what is the 'a'th row?).

    The same thing happens when you try arr['a', 1:3], because this is equivalent to indexing with the tuple ('a', slice(1, 3, None)). The comma between 'a' and 1:3 is what makes it a tuple, regardless of the lack of brackets. Again, numpy tries to index into the rows of arr with 'a', which is invalid. However, even if both elements were valid index types, you would still get an IndexError, since the length of your tuple (2) is greater than the number of dimensions in arr (1).

    arr['a'][1:3] and arr[1:3]['a'] are both perfectly valid ways to index a slice of a field.