I currently have a list of values and an awkward array of integer values. I want the same dimension awkward array, but where the values are the indices of the "values" arrays corresponding with the integer values of the awkward array. For instance:
values = ak.Array(np.random.rand(100))
arr = ak.Array((np.random.randint(0, 100, 33), np.random.randint(0, 100, 125)))
I want something like values[arr], but that gives the following error:
>>> values[arr]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Anaconda3\lib\site-packages\awkward\highlevel.py", line 943, in __getitem__
return ak._util.wrap(self._layout[where], self._behavior)
ValueError: cannot fit jagged slice with length 2 into RegularArray of size 100
If I run it with a loop, I get back what I want:
>>> values = ([values[i] for i in arr])
>>> values
[<Array [0.842, 0.578, 0.159, ... 0.726, 0.702] type='33 * float64'>, <Array [0.509, 0.45, 0.202, ... 0.906, 0.367] type='125 * float64'>]
Is there another way to do this, or is this it? I'm afraid it'll be too slow for my application.
Thanks!
If you're trying to avoid Python for loops for performance, note that the first line casts a NumPy array as Awkward with ak.from_numpy (no loop, very fast):
>>> values = ak.Array(np.random.rand(100))
but the second line iterates over data in Python (has a slow loop):
>>> arr = ak.Array((np.random.randint(0, 100, 33), np.random.randint(0, 100, 125)))
because a tuple of two NumPy arrays is not a NumPy array. It's a generic iterable, and the constructor falls back to ak.from_iter.
On your main question, the reason that arr
doesn't slice values
is because arr
is a jagged array and values
is not:
>>> values
<Array [0.272, 0.121, 0.167, ... 0.152, 0.514] type='100 * float64'>
>>> arr
<Array [[15, 24, 9, 42, ... 35, 75, 20, 10]] type='2 * var * int64'>
Note the types: values
has type 100 * float64
and arr
has type 2 * var * int64
. There's no rule for values[arr]
.
Since it looks like you want to slice values
with arr[0]
and then arr[1]
(from your list comprehension), it could be done in a vectorized way by duplicating values
for each element of arr
, then slicing.
>>> # The np.newaxis is to give values a length-1 dimension before concatenating.
>>> duplicated = ak.concatenate([values[np.newaxis]] * 2)
>>> duplicated
<Array [[0.272, 0.121, ... 0.152, 0.514]] type='2 * 100 * float64'>
Now duplicated
has length 2 and one level of nesting, just like arr
, so arr
can slice it. The resulting array also has length 2, but the length of each sublist is the length of each sublist in arr
, rather than 100.
>>> duplicated[arr]
<Array [[0.225, 0.812, ... 0.779, 0.665]] type='2 * var * float64'>
>>> ak.num(duplicated[arr])
<Array [33, 125] type='2 * int64'>
If you're scaling up from 2 such lists to a large number, then this would eat up a lot of memory. Then again, the size of the output of this operation would also scale as "length of values
" × "length of arr
". If this "2" is not going to scale up (if it will be at most thousands, not millions or more), then I wouldn't worry about the speed of the Python for loop. Python scales well for thousands, but not billions (depending, of course, on the size of the things being scaled!).