I can certainly do
a[a == 0] = something
that sets every entry of a
that equals zero to something
. Equivalently, I could write
a[np.equal(a, 0)] = something
Now, imagine a
is an array of dtype=object
. I cannot write a[a is None]
because, of course, a
itself isn't None
. The intention is clear: I want the comparison is
to be broadcast like any other ufunc. This list from the docs lists nothing like an is
-unfunc.
Why is there none, and, more interestingly to me: what would be a performant replacement?
Except for operations like reshape
and indexing that don't depend on dtype
(except for the itemsize), operations on object dtype arrays are performed at list-comprehension speeds, iterating on the elements and applying an appropriate method to each. Sometimes that method doesn't exist, such as when doing np.sin
.
To illustrate, consider the array from one of the comments:
In [132]: a = np.array([1, None, 0, np.nan, ''])
In [133]: a
Out[133]: array([1, None, 0, nan, ''], dtype=object)
The object array test:
In [134]: a==None
Out[134]: array([False, True, False, False, False])
In [135]: timeit a==None
5.16 µs ± 73.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
An equivalent comprehension:
In [136]: [x is None for x in a]
Out[136]: [False, True, False, False, False]
In [137]: timeit [x is None for x in a]
1.52 µs ± 18.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
It's faster, even if we cast the result back to array (not a cheap step):
In [138]: timeit np.array([x is None for x in a])
4.67 µs ± 95.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Iteration on the list version of the array is even faster:
In [139]: timeit np.array([x is None for x in a.tolist()])
2.52 µs ± 48.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Let's look at the full assignment action:
In [141]: a[[x is None for x in a.tolist()]]
Out[141]: array([None], dtype=object)
In [142]: %%timeit a1=a.copy()
...: a1[[x is None for x in a1.tolist()]] = np.nan
...:
...:
4.03 µs ± 10 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [143]: %%timeit a1=a.copy()
...: a1[a1==None] = np.nan
...:
...:
6.18 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The usual caveat that things might scale differently.