Can anyone explain the following result to me? I know it is not as one would usually do this operation, but I found this result odd.
import numpy as np
a = np.ma.masked_where(np.arange(20)>10,np.arange(20))
b = np.ma.masked_where(np.arange(20)>-1,np.arange(20))
c = np.zeros(a.shape)
d = np.zeros(a.shape)
c[~a.mask] += b[~a.mask]
print(b[~a.mask])
#masked_array(data=[--, --, --, --, --, --, --, --,--, --, --],
# mask=[ True, True, True, True, True, True, True, True, True, True, True],
# fill_value=999999,
# dtype=int64)
print(c)
#[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
d[~a.mask] = d[~a.mask] + b[~a.mask]
print(d)
#[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
I expected c
to not change, but I guess there is something related to objects in memory going on here. Also, +=
keeps the original object, while =
and +
creates a new d
.
I just don't really understand where the data comes from that's added to c
.
I will start with a simpler example for better understanding:
b = np.ma.masked_where(np.arange(20)>-1,np.arange(20))
#b: [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]
#b.data: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
c = np.zeros(b.shape)
#c: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
d = np.zeros(b.shape)
#d: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
c += b
#c: [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]
d = d + b
#d: [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]
#d.data: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
The first operation c += b
is an in-place operation. In other words, it is equivalent to c = type(c).__iadd__(c, b)
which does the addition according to type of c
, which is not a masked array, hence the data of b
used as unmasked.
On the other hand, d = d + b
is equivalent to d = np.MaskedArray.__add__(d, b)
(to be more particular, since masked arrays are a subclass of ndarrays, it uses __radd__
) and is NOT an in-place assignment. This means it creates a new object and uses the wider type on the right hand side of the equation when adding and hence converts d (which is an unmasked array) to a masked array (because b
is a masked array), therefore the addition uses valid values only (which in this case there is none since ALL elements of b
are masked and invalid). This results in a masked array d
with same mask as b
while the data of d
remains unchanged.
This difference in behavior is not Numpy specific and applies to python itself too. The case mentioned in the question by OP has similar behavior, and as @alaniwi mentioned in the comments, the boolean indexing with mask a
is not fundamental to the behavior. Using a
to mask elements of b
, c
, and d
is only limiting the assignment to masked elements by a
(rather than all elements of arrays) and nothing more.
To makes things a bit more interesting and in fact clearer, lets switch the places of b
and d
on the right hand side:
e = np.zeros(b.shape)
#e: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
e = b + e
#e: [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]
#e.data: [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]
Note that, similar to d = d + b
, the right hand side uses masked array __add__
function, so the output is a masked array, but since you are adding e
to b
(a.k.a e = np.MaskedArray.__add__(b, e)
), the masked data of b
is returned, while in d = d + b
, you are adding b
to d
and data of d
is returned.