Search code examples
pythonarraysnumpymask

Taking mean along columns with masks in Python


I have a 2D array containing data from some measurements. I have to take mean along each column considering good data only. Hence I have another 2D array of the same shape which contains 1s and 0s showing whether data at that (i,j) is good or bad. Some of the "bad" data can be nan as well.

def mean_exc_mask(x, mas): #x is the real data arrray
                           #mas tells if the data at the location is good/bad
    sum_array   = np.zeros(len(x[0]))
    avg_array   = np.zeros(len(x[0]))
    items_array = np.zeros(len(x[0]))

    for i in range(0, len(x[0])): #We take a specific column first
            for j in range(0, len(x)): #And then parse across rows

                    if mas[j][i]==0: #If the data is good
                            sum_array[i]= sum_array[i] + x[j][i]
                            items_array[i]=items_array[i] + 1

            if  items_array[i]==0: # If none of the data is good for a particular column
                    avg_array[i] = np.nan
            else:
                    avg_array[i] = float(sum_array[i])/items_array[i]
    return avg_array

I am getting all values as nan!

Any ideas of what's going on wrong here or someother way?


Solution

  • The code seems to work for me, but you can do it a whole lot simpler by using the build-in aggregation in Numpy:

    (x*(m==0)).sum(axis=0)/(m==0).sum(axis=0)

    I tried it with:

    x=np.array([[-0.32220561, -0.93043128, 0.37695923],[ 0.08824206, -0.86961453, -0.54558324],[-0.40942331, -0.60216952, 0.17834533]]) and m=array([[1, 1, 0],[1, 0, 0],[1, 1, 1]])

    If you post example data, it is often easier to give a qualified answer.