Search code examples
pythondaskmasked-array

How to extarct the mask from a Dask masked array?


In Dask, there is a class called MaskedArray, corresponding to the NumPy class with the same name. The NumPy class has the methods getdata and getmask, which it seems like the Dask class is also supposed to have; however, I can only find a mention of getmask for the Dask class as red text at the getdata documentation page, and when I try to call it in my script, I get

AttributeError: module 'dask.array.ma' has no attribute 'getmask'

So, where is the getmask method for the Dask class? Do I need to obtain the mask in some other way? Or isn't it possible to extract a mask from a Dask masked array at all in the same way as it is possible for a NumPy masked array, and if so, why not? Do I have to convert the Dask masked array to a NumPy masked array before I can extract the mask?


Solution

  • You can use the module method dask.array.ma.getmaskarray. Here's a simple example:

    In [2]: import dask.array.ma
       ...: import dask.array
       ...: import numpy as np
    
    In [3]: arr = dask.array.from_array(np.arange(16).reshape(4, 4), chunks=(2, 2))
       ...: mask = dask.array.from_array(np.random.random(size=(4, 4)) > 0.5, chunks=(2, 2))
       ...: masked = dask.array.ma.masked_array(arr, mask)
    
    In [4]: masked
    Out[4]: dask.array<masked_array, shape=(4, 4), dtype=int64, chunksize=(2, 2), chunktype=numpy.MaskedArray>
    
    In [5]: da_mask = dask.array.ma.getmaskarray(masked)
    
    In [6]: da_mask
    Out[6]: dask.array<getmaskarray, shape=(4, 4), dtype=bool, chunksize=(2, 2), chunktype=numpy.ndarray>
    
    In [7]: da_mask.compute()
    Out[7]:
    array([[False, False,  True,  True],
           [ True,  True,  True,  True],
           [False,  True, False, False],
           [ True, False,  True, False]])
    

    You can view the list of available module-level methods in the Dask Masked Array API Documentation.