I have a tree with one branch storing a string.
When I read using uproot.open()
and then the method arrays()
I get the following:
>>> array_train['backtracked_end_process']
<ObjectArray [b'FastScintillation' b'FastScintillation' b'FastScintillation' ... b'FastScintillation' b'FastScintillation' b'FastScintillation'] at 0x7f48936e6c90>
I would like to use this branch to create masks, by doing things like
array_train['backtracked_end_process'] != b'FastScintillation'
but unfortunately this produces an error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-97-a28f3706c5b5> in <module>
----> 1 array_train['backtracked_end_process'] == b'FastScintillation'
~/.local/lib/python3.7/site-packages/numpy/lib/mixins.py in func(self, other)
23 if _disables_array_ufunc(other):
24 return NotImplemented
---> 25 return ufunc(self, other)
26 func.__name__ = '__{}__'.format(name)
27 return func
~/.local/lib/python3.7/site-packages/awkward/array/objects.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
216 contents.append(x)
217
--> 218 result = getattr(ufunc, method)(*contents, **kwargs)
219
220 if self._util_iscomparison(ufunc):
~/.local/lib/python3.7/site-packages/awkward/array/jagged.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
987 data = self._util_toarray(inputs[i], inputs[i].dtype)
988 if starts.shape != data.shape:
--> 989 raise ValueError("cannot broadcast JaggedArray of shape {0} with array of shape {1}".format(starts.shape, data.shape))
990
991 if parents is None:
ValueError: cannot broadcast JaggedArray of shape (24035,) with array of shape ()
Does anyone have any suggestion on how to proceed? Being able to transform it to a numpy.chararray
would already solve the problem, but I don't know how to do that.
String-handling is a weak point in uproot. It uses a custom ObjectArray
(not even the StringArray
in awkward-array), which generates bytes
objects on demand. What you'd like is an array-of-strings class with ==
overloaded to mean "compare each variable-length string, broadcasting a single string to an array if necessary." Unfortunately, neither the uproot ObjectArray
of strings nor the StringArray
class in awkward-array do that yet.
So here's how you can do it, admittedly through an implicit Python for loop.
>>> import uproot, numpy
>>> f = uproot.open("http://scikit-hep.org/uproot/examples/sample-6.10.05-zlib.root")
>>> t = f["sample"]
>>> t["str"].array()
<ObjectArray [b'hey-0' b'hey-1' b'hey-2' ... b'hey-27' b'hey-28' b'hey-29'] at 0x7fe835b54588>
>>> numpy.array(list(t["str"].array()))
array([b'hey-0', b'hey-1', b'hey-2', b'hey-3', b'hey-4', b'hey-5',
b'hey-6', b'hey-7', b'hey-8', b'hey-9', b'hey-10', b'hey-11',
b'hey-12', b'hey-13', b'hey-14', b'hey-15', b'hey-16', b'hey-17',
b'hey-18', b'hey-19', b'hey-20', b'hey-21', b'hey-22', b'hey-23',
b'hey-24', b'hey-25', b'hey-26', b'hey-27', b'hey-28', b'hey-29'],
dtype='|S6')
>>> numpy.array(list(t["str"].array())) == b"hey-0"
array([ True, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False])
The loop is implicit in the list
constructor that iterates over the ObjectArray
, turning each element into a bytes
string. This Python list is not good for array-at-a-time operations, so we then construct a NumPy array, which is (at a cost of padding).
Alternative, probably better:
While writing this, I remembered that uproot's ObjectArray
is implemented using an awkward JaggedArray
, so the transformation above can be performed with JaggedArray
's regular
method, which is probably much faster (no intermediate Python bytes
objects, no Python for loop).
>>> t["str"].array().regular()
array([b'hey-0', b'hey-1', b'hey-2', b'hey-3', b'hey-4', b'hey-5',
b'hey-6', b'hey-7', b'hey-8', b'hey-9', b'hey-10', b'hey-11',
b'hey-12', b'hey-13', b'hey-14', b'hey-15', b'hey-16', b'hey-17',
b'hey-18', b'hey-19', b'hey-20', b'hey-21', b'hey-22', b'hey-23',
b'hey-24', b'hey-25', b'hey-26', b'hey-27', b'hey-28', b'hey-29'],
dtype=object)
>>> t["str"].array().regular() == b"hey-0"
array([ True, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False])
(The functionality described above wasn't created intentionally, but it works because the right pieces compose in a fortuitous way.)