Search code examples
pythonrootuproot

How to use uproot to load referenced values (TRefArray)


I am trying to use uproot to do some basic selection from a Delphes .root output file. Delphes's c++ code examples are looping over events and accessing reconstructed BranchElements, which have methods to access the branch elements belonging to various classes.

e.g. The root file contains a <TBranchElement b'Jet' at 0x7fb36526b3c8> that (in c++) Delphes example code can use to get object=jet->Constituents.At(i) in a "for" loop and then if this object is a object->IsA() == Tower::Class then one calls object->P4() to get the 4momentum. So, whereas using uproot one could only separately get the two values, in Delphes examples one uses the Jet class to get access to the Tower class (from which the Jet was reconstructed) using a method.

The information I see is:

    Jet_size                   (no streamer)              asdtype('>i4')
    Jet                        TStreamerInfo              asdtype('>i4')
    Jet.fUniqueID              TStreamerBasicType         asjagged(asdtype('>u4'))
.
.
.
    Jet.Constituents           TStreamerInfo              asgenobj(SimpleArray(TRefArray))


<TBranchElement b'Jet' at 0x7fb3657825f8>

<TBranchElement b'Jet.Constituents' at 0x7fb47840cba8>

For uproot, if one loads the TBranchElement as an array, then there's only access to the array elements in Jet.Constituents[i], which are lists of numbers. How would I be able to load Jet.Constituents in a way that refers to the Tower.PT (or eta,phi etc.) values that it contains?


Solution

  • If you have an array of TRefs, you can use them directly as an integer index on the other collection. (See this tutorial, starting on In[29], for a general introduction to indexing by integer arrays, both in Numpy and Awkward Array.)

    That is, if you have an array of TRef, as in this example,

    import uproot
    t = uproot.open("issue324.root")["Delphes"]
    refs = t["Track.Particle"].array()
    refs.id
    # <JaggedArray [
    #      [752 766 780 ... 1813 1367 1666]
    #      ...
    #      [745 762 783 ... 1863 1713 1717]]>
    

    gives you the indexes and

    pt = t["Particle.PT"].array()
    

    the array you want to reference, so

    pt[refs.id - 1]
    # <JaggedArray [
    #      [0.7637838 1.1044897 5.463864 ... 4.252923 1.9702696 9.213475]
    #      ...
    #      [1.2523094 0.37887865 0.7390242 ... 1.0288503 3.4785874 1.804613]]>
    

    selects the pt values of interest (correcting for the fact that these indexes start with 1 and Python indexes start with 0).

    If you have an array of TRefArray as in this example,

    t["Tower.Particles"].array()
    # <ObjectArray [[[1414, 1418, 1471, 1571], [1447], [1572],
    #               ...,
    #               [864, 1325], [992, 1437], [1262, 1501]]]>
    

    it's actually an ObjectArray that generates sub-arrays from the data on demand (because ROOT doesn't store doubly jagged data natively). You can convert them to native JaggedArrays by calling awkward.fromiter on them:

    import awkward
    a = awkward.fromiter(t["Tower.Particles"].array())
    # <JaggedArray [[[1414 1418 1471 1571] [1447] [1572] 
    #                ...
    #                [864 1325] [992 1437] [1262 1501]]]>
    

    and then use these doubly jagged indexes in any doubly jagged collection (in which all the numbers of elements line up, as they would for the collection you're referencing).