Search code examples
pythonnumpymasked-array

Loop over clump_masked indices


I have an array y_filtered that contains some masked values. I want to replace these values by some value I calculate based on their neighbouring values. I can get the indices of the masked values by using masked_slices = ma.clump_masked(y_filtered). This returns a list of slices, e.g. [slice(194, 196, None)].

I can easily get the values from my masked array, by using y_filtered[masked_slices], and even loop over them. However, I need to access the index of the values as well, so i can calculate its new value based on its neighbours. Enumerate (logically) returns 0, 1, etc. instead of the indices I need.

Here's the solution I came up with.

# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)

y_enum = [(i, y_i) for i, y_i in zip(range(len(y_filtered)), y_filtered)]

for sl in masked_slices:
    for i, y_i in y_enum[sl]:
        # simplified example calculation
        y_filtered[i] = np.average(y_filtered[i-2:i+2])

It is very ugly method i.m.o. and I think there has to be a better way to do this. Any suggestions?

Thanks!


Solution

  • EDIT:

    I figured out a better way to achieve what I think you want to do. This code picks every window of 5 elements and compute its (masked) average, then uses those values to fill the gaps in the original array. If some index does not have any unmasked value close enough it will just leave it as masked:

    import numpy as np
    from numpy.lib.stride_tricks import as_strided
    
    SMOOTH_MARGIN = 2
    x = np.ma.array(data=[1, 2, 3, 4, 5, 6, 8, 9, 10],
                    mask=[0, 1, 0, 0, 1, 1, 1, 1, 0])
    print(x)
    # [1 -- 3 4 -- -- -- -- 10]
    
    pad_data = np.pad(x.data, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant')
    pad_mask = np.pad(x.mask, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant',
                      constant_values=True)
    k = 2 * SMOOTH_MARGIN + 1
    isize = x.dtype.itemsize
    msize = x.mask.dtype.itemsize
    x_pad = np.ma.array(
        data=as_strided(pad_data, (len(x), k), (isize, isize), writeable=False),
        mask=as_strided(pad_mask, (len(x), k), (msize, msize), writeable=False))
    x_avg = np.ma.average(x_pad, axis=1).astype(x_pad.dtype)
    fill_mask = ~x_avg.mask & x.mask
    result = x.copy() 
    result[fill_mask] = x_avg[fill_mask]
    print(result)
    # [1 2 3 4 3 4 10 10 10]
    

    (note all the values are integers here because x was originally of integer type)

    The original posted code has a few errors, firstly it both reads and writes values from y_filtered in the loop, so the results of later indices are affected by the previous iterations, this could be fixed with a copy of the original y_filtered. Second, [i-2:i+2] should probably be [max(i-2, 0):i+3], in order to have a symmetric window starting at zero or later always.


    You could do this:

    from itertools import chain
    
    # get indices of masked data
    masked_slices = ma.clump_masked(y_filtered)
    for idx in chain.from_iterable(range(s.start, s.stop) for s in masked_slices):
        y_filtered[idx] = np.average(y_filtered[max(idx - 2, 0):idx + 3])