Search code examples
uprootawkward-array

How to use np arrays as a mask for jagged arrays (python - awkward)?


I have a root file from which I would like to extract a certain candidate per event. On the other hand, I have a numpy array containing the index of the candidate I want to extract.

Let's say that my root file has the following branch:

branch = [[8.956237 9.643666] [5.823581] [3.77208 5.6549993] [5.91686] [13.819047 14.108783]]

And I want the first candidate of the first 4 events and the second for the last, therefore, I have the following numpy array:

npMask = array([[0],[0],[0],[0],[1]])

When I apply the npMask to the branch, the result is not what I expected:

branch[npMask]
[[[8.956237 9.643666]] [[8.956237 9.643666]] [[8.956237 9.643666]] [[8.956237 9.643666]] [[5.823581]]]

However, if I cast the numpy array into a jagged array, it works just fine:

awkMask = awk.fromiter(npMask)

branch[awkMask]
[[8.956237] [5.823581] [3.77208] [5.91686] [14.108783]]

The problem here is that casting takes too much time, I am using the iterate method and with 10k entrysteps, the casting takes around 65% of the time per iteration.

So, my question here is: Is there a correct way to use a numpy array as a mask for a jagged array?



Note

I create my numpy array by comparing three different branches and selecting the candidate with the highest value from those three branches, e.g.

compare1 = [[0 -0.1] [0] [0.65 0.55] [0.5] [0.6 0.9]]

compare2 = [[0.99 -0.1] [0.9] [0.45 0.2] [0.5] [0.66 0.99]]

compare3 = [[0.91 0.3] [0.77] [0.5 -0.2] [0.5] [0.87 0.59]]


Solution

  • awkward.fromiter is the one function that was allowed to be written in Python for loops, and hence it is designated to be slow. The function you want for turning regular NumPy arrays into JaggedArrays that happen to have uniform counts is JaggedArray.fromregular. That ought to be considerably faster.

    Meanwhile, your original issue is an example of an inconsistency in Awkward 0.x. In Awkward 1.x, the behavior of Awkward Arrays that happen to be regular and NumPy arrays with the same logical meaning are identical. You might want to consider awkward1.from_awkward0 in the awkward1 library to try it out. (It's a separate library because the interface is a little different and I don't want to break anyone's analysis!)