Search code examples
pythonarraysrasterfill

Fill Holes with Majority of Surrounding Values (Python)


I use Python and have an array with values 1.0 , 2.0 , 3.0 , 4.0 , 5.0 , 6.0 and np.nan as NoData.

I want to fill all "nan" with a value. This value should be the majority of the surrounding values.

For example:

1 1 1 1 1
1 n 1 2 2
1 3 3 2 1
1 3 2 3 1

"n" shall present "nan" in this example. The majority of its neighbors have the value 1. Thus, "nan" shall get replaced by value 1.

Note, that the holes consisting of "nan" can be of the size 1 to 5. For example (maximum size of 5 nan):

1 1 1 1 1
1 n n n 2
1 n n 2 1
1 3 2 3 1

Here the hole of "nan" have the following surrounding values:

surrounding_values = [1,1,1,1,1,2,1,2,3,2,3,1,1,1] -> Majority = 1

I tried the following code:

from sklearn.preprocessing import Imputer

array = np.array(.......)   #consisting of 1.0-6.0 & np.nan
imp = Imputer(strategy="most_frequent")
fill = imp.fit_transform(array)

This works pretty good. However, it only uses one axis (0 = column, 1 = row). The default is 0 (column), so it uses the majority of the surrounding values of the same column. For example:

Array
2 1 2 1 1
2 n 2 2 2
2 1 2 2 1
1 3 2 3 1

Filled Array
2 1 2 1 1
2 1 2 2 2
2 1 2 2 1
1 3 2 3 1

So here you see, although the majority is 2, the majority of the surrounding column-values is 1 and thus it becomes 1 instead of 2.

As a result, I need to find another method using python. Any suggestions or ideas?


SUPPLEMENT:

Here you see the result, after I added the very helpfull improvement of Martin Valgur.

enter image description here

Think of "0" as sea (blue) and of the other values (> 0) as land (red).

If there is a "little" sea surrounded by land (the sea can again have the size 1-5 px) it will get land, as you can successfully see in the result-image. If the surrounded sea is bigger than 5px or outside the land, the sea wont gain land (This is not visible in the image, because it is not the case).

If there is 1px "nan" with more majority of sea than land, it will still become land (In this example it has 50/50).

The following picture shows what I need. At the border between sea (value=0) and land (value>0), the "nan"-pixel needs to get the value of the majority of the land-values.

enter image description here

That sounds difficult and I hope that I could explain it vividly.


Solution

  • A possible solution using label() and binary_dilation() from scipy.ndimage:

    import numpy as np
    from scipy.ndimage import label, binary_dilation
    from collections import Counter
    
    def impute(arr):
        imputed_array = np.copy(arr)
    
        mask = np.isnan(arr)
        labels, count = label(mask)
        for idx in range(1, count + 1):
            hole = labels == idx
            surrounding_values = arr[binary_dilation(hole) & ~hole]
            most_frequent = Counter(surrounding_values).most_common(1)[0][0]
            imputed_array[hole] = most_frequent
    
        return imputed_array
    

    EDIT: Regarding your loosely-related follow-up question, you can extend the above code to achieve what you are after:

    import numpy as np
    from scipy.ndimage import label, binary_dilation, binary_closing
    
    def fill_land(arr):
        output = np.copy(arr)
    
        # Fill NaN-s
        mask = np.isnan(arr)
        labels, count = label(mask)
        for idx in range(1, count + 1):
            hole = labels == idx
            surrounding_values = arr[binary_dilation(hole) & ~hole]
            output[hole] = any(surrounding_values)
    
        # Fill lakes
        land = output.astype(bool)
        lakes = binary_closing(land) & ~land
        labels, count = label(lakes)
        for idx in range(1, count + 1):
            lake = labels == idx
            output[lake] = lake.sum() < 6
    
        return output