Search code examples
pythonarraysnumpymasked-array

What practical impact (if any) does the `fill_value` of a masked array have?


When displaying a MaskedArray, I'm told the data, the mask, and the fill value. Of course, data and mask are very important. But what is the practical significance of the fill value? I can even change it, but why would I want to do that — isn't the fill value just an implementation detail with no practical impact?

In other words: does the fill_value have any impact on any code not directly addressing fill_value?


Solution

  • Looking at the Masked_Array class code, I see:

    • methods for setting and getting fill_value

    • filled() method, which returns a copy with the masked values replaced by the fill_value. This is the 'direct' use of it.

    • methods that call filled as part of their calculation.

    masked.all() fills with True and then does the ordinary array all.

    masked.any() fills with False.

    masked.nonzero() does:

    return narray(self.filled(0), copy=False).nonzero()
    

    trace and sum also fill with 0, but prod fills with 1.

    argsort (and other methods like argmin) uses:

     d = self.filled(fill_value).view(ndarray)
    

    Those methods take a fill_value parameter, or use the self.fill_value. For methods like this the user potentially has strong preferences as to how the masked values are used in sorting or taking the minimum/maximum.

    So fill_value is essential to efficient array calculation. Some methods require a special value, others can use whatever the user wants.