My data is several arrays of data taken of the same length. I am masking one array (y) then using that masked array to mask a 2nd array (x). I mask x to get rid of values indicating equipment error (-9999). I then use np.where() to find out where y is low (1 standard dev below the mean) to mask x in order to see the values of x when y is low.
I have tried changing my mask several times but none of the other numpy masked array operations gave me a different result. I tried to write a logical statement to give me the values when the mask = FALSE but I cannot do that within the np.where() statement.
x = np.array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] )
y = np.array( [ 0, 1, -9999, 3, 4, 5, 6, 7, 8, -9999, 10 ] )
x = np.ma.masked_values( x, -9999 )
y = np.ma.masked_values( y, -9999 )
low_y = ( y.mean() - np.std( y ) )
x_masked = x[ np.where( y < low_y ) ]
When we call x_masked, it returns:
>>>x_masked
masked_array(data=[0, 1, 2, 9],
mask=False,
fill_value=-9999)
We expect the mean of x_masked to be 0.5 ( (0 + 1)/2 ) but instead the mean is 3 because of the masked -9999 values ( 2 & 9) that were included in x_masked.
Is there a way to exclude the masked values in order to only get the unmasked values?
Since version 1.8 numpy added nanstd
and nanmean
to handle missing data. In your case since the -9999 is there to indicate error state and by definition I think it is a good use case of numpy.nan
In [76]: y = np.where(y==-9999, np.nan, y)
In [77]: low_y = (np.nanmean(y) - np.nanstd(y))
In [78]: low_y
Out[78]: 1.8177166753143883
In [79]: x_masked = x[ np.where( y < low_y ) ] # [0, 1]