Numpy: apply mask to values, then take mean, but in parallel

I have an 1d numpy array of values:

v = np.array([0, 1, 4, 0, 5])

Furthermore, I have a 2d numpy array of boolean masks (in production, there are millions of masks):

m = np.array([
    [True, True, False, False, False],
    [True, False, True, False, True],
    [True, True, True, True, True],
])

I want to apply each row from the mask to the array v, and then compute the mean of the masked values.

Expected behavior:

results = []
for mask in m:
    results.append(np.mean(v[mask]))

print(results) # [0.5, 3.0, 2.0]

Easy to do sequentially, but I am sure there is a beautiful version in parallel? One solution, that I've found:

mask = np.ones(m.shape)
mask[~m] = np.nan
np.nanmean(v * mask, axis=1) # [0.5, 3.0, 2.0]

Is there another solution, perhaps using np.ma module? I am looking for a solution that is faster than my current two solutions.

Solution

I think the cleanest vectorized approach would be something like this:

result = np.broadcast_to(v, m.shape).mean(axis=1, where=m)

However, this does involve explicitly broadcasting v to the shape of m, so depending on memory constraints it may not be optimal.