Search code examples
pythonnumpyweighted-averagemasked-array

How can I vectorize a masked weighted average with condition using numpy?


The unvectorized code reads:

import numpy as np
import numpy.ma as ma

np.random.seed(42)
H = np.random.uniform(0.1, 1.0, size=(6,8))
r, c = H.shape

mask = H.max(axis=1) > 0.95


x = np.linspace(0, 10, c)
weighted_averages = ma.masked_all((r,), dtype=H.dtype)

for i in range(r):
    if mask[i]:
        weighted_averages[i] = np.average(x, weights=H[i, :])

Here's my attempt at vectorizing it:


_, xx = np.mgrid[0:10:r*1j, 0:10:c*1j]
not_mask = np.logical_not(mask)


weighted_averages = np.average(xx, weights=H, axis=1)
mwa = ma.masked_array(weighted_averages, mask=not_mask)

It works, in the sense that the outputs are the same, but I'm "cheating" because I first compute all the averages and then mask the "unwanted" values. How could I avoid the unnecesary computations? I'm guessing I have to somehow mask xx, H, or both.


Solution

  • How about this -

    import numpy as np
    import numpy.ma as ma
    
    np.random.seed(42)
    H = np.random.uniform(0.1, 1.0, size=(6,8))
    r, c = H.shape
    
    mask = H.max(axis=1) > 0.95
    
    x = np.linspace(0, 10, c)
    
    H_mask = H[mask]
    wa = (np.sum(x * H_mask, axis=1))/np.sum(H_mask, axis=1)
    weighted_averages = ma.masked_all((r,), dtype=H.dtype)
    
    weighted_averages[mask] = wa
    

    Simply mask the array first and then take the averages. I don't think that you can use np.average here for this becasue it doesn't seem to support broadcasting. Hence, simply do the mean manually.