I have a JaggedArray
(awkward.array.jagged.JaggedArray
) that contains indices that point to positions in another JaggedArray
. Both arrays have the same length, but each of the numpy.ndarrays
that the JaggedArrays
contain can be of different length. I would like to sort the second array using the indices of the first array, at the same time dropping the elements from the second array that are not indexed from the first array. The first array can additionally contain values of -1
(could also be replaced by None
if needed, but this is currently not that case) that mean that there is no match in the second array. In such a case, the corresponding position in the first array should be set to a default value (e.g. 0
).
Here's a practical example and how I solve this at the moment:
import uproot
import numpy as np
import awkward
def good_index(my_indices, my_values):
my_list = []
for index in my_indices:
if index > -1:
my_list.append(my_values[index])
else:
my_list.append(0)
return my_list
indices = awkward.fromiter([[0, -1], [3,1,-1], [-1,0,-1]])
values = awkward.fromiter([[1.1, 1.2, 1.3], [2.1,2.2,2.3,2.4], [3.1]])
new_map = awkward.fromiter(map(good_index, indices, values))
The resulting new_map
is: [[1.1 0.0] [2.4 2.2 0.0] [0.0 3.1 0.0]]
.
Is there a more efficient/faster way achieving this? I was thinking that one could use numpy
functionality such as numpy.where
, but due to the different lengths of the ndarrays
this fails at least for the ways that I tried.
If all of the subarrays in values
are guaranteed to be non-empty (so that indexing with -1
returns the last subelement, not an error), then you can do this:
>>> almost = values[indices] # almost what you want; uses -1 as a real index
>>> almost.content = awkward.MaskedArray(indices.content < 0, almost.content)
>>> almost.fillna(0.0)
<JaggedArray [[1.1 0.0] [2.4 2.2 0.0] [0.0 3.1 0.0]] at 0x7fe54c713c88>
The last step is optional because without it, the missing elements are None
, rather than 0.0
.
If some of the subarrays in values
are empty, you can pad
them to ensure they have at least one subelement. All of the original subelements are indexed the same way they were before, since pad
only increases the length, if need be.
>>> values = awkward.fromiter([[1.1, 1.2, 1.3], [], [2.1, 2.2, 2.3, 2.4], [], [3.1]])
>>> values.pad(1)
<JaggedArray [[1.1 1.2 1.3] [None] [2.1 2.2 2.3 2.4] [None] [3.1]] at 0x7fe54c713978>