Search code examples
pythonarraysmultidimensional-arraymaxmask

How to create a mask array from the max value of all overlapping arrays?


I have many 2d arrays of the same dimension. Each pixel has a value of 0, 1, or 23. Each array has the values distributed in different locations. I want a max-mask that is the result of overlaying all arrays and taking the max value of each location. I also want a min-mask that does the opposite. Sorry if it's a stupid question... I tried to search but only got the methods to get a single max/min value over the entire array or on a certain axis. Thanks a lot in advance if you could help!

My arrays are large and I will give a simple example here -

a = ([[0, 1, 0, 0, 23],
      [1, 0, 0, 0, 1],
      [23, 23, 0, 1, 1],
      [1, 1, 23, 0, 1]])
b = ([[23, 1, 0, 1, 1],
      [1, 0, 0, 23, 1],
      [0, 23, 0, 23, 1],
      [1, 1, 0, 0, 23]])
# After some coding, max_mask and min_mask should be:
max_mask = ([[23, 1, 0, 1, 23],
             [1, 0, 0, 23, 1],
             [23, 23, 0, 23, 1],
             [1, 1, 23, 0, 23]])
min_mask = ([[0, 1, 0, 0, 1],
            [1, 0, 0, 0, 1],
            [0, 23, 0, 1, 1],
            [1, 1, 0, 0, 1]])

As I have too many arrays created from generic funtion and they are named like data1985, data1986... data2020, is there a way I can loop through all of them in an easier way?

# this is how I create them by reading images
for i in range(1985, 2021):
        globals()[f"data{i}"], globals()[f"geo{i}"], globals()[f"proj{i}"]  = read_tif(r"C:\Users\wqtcl\Desktop\REDD\images/" +str(i)+".tif")
        globals()[f"data{i}"][np.isnan(globals()[f"data{i}"])]=23

# I want something like this (or easier!!)
# initialize array filled with zeros
mask = np.zeros([len(data1985), len(data1985[0])], dtype=int)

# populate array
for i in range(1985, 2021):
    for j in range(len(data1985)):
        for k in range(len(data1985[0])):
            mask[j][k] = max(globals()[f"data{i}"][j][k])

# I got this error though...
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_18020/2600759678.py in <module>
      6     for j in range(len(data1985)):
      7         for k in range(len(data1985[0])):
----> 8             mask[j][k] = max(globals()[f"data{i}"][j][k])
      9 
     10 print(mask)

TypeError: 'numpy.float32' object is not iterable

Solution

  • EDIT: regarding your edit, dynamic variable creation is never a good idea! Read your tif images into a list of arrays, then use my solution.

    path = r"C:\Users\wqtcl\Desktop\REDD\images/{}.tif"
    geo_data = {"data": [], "geo": [], "proj": []}
    
    
    for i in range(1985, 2021):
        data, geo, proj = read_tif(path.format(i))
        geo_data["data"].append(data)
        geo_data["geo"].append(geo)
        geo_data["proj"].append(proj)
    
    
    images = np.array(geo_data["data"])
    images[np.isnan(images)] = 23.
    max_mask = images.max(axis=0)
    min_mask = images.min(axis=0)
    

    Note that I cannot test this as I do not have gdal installed and I don't have a bunch of random .tif files with which to test this approach.

    Original solution

    Here you go:

    In [9]: a
    Out[9]:
    array([[ 0,  1,  0,  0, 23],
           [ 1,  0,  0,  0,  1],
           [23, 23,  0,  1,  1],
           [ 1,  1, 23,  0,  1]])
    
    In [10]: b
    Out[10]:
    array([[23,  1,  0,  1,  1],
           [ 1,  0,  0, 23,  1],
           [ 0, 23,  0, 23,  1],
           [ 1,  1,  0,  0, 23]])
    
    In [11]: np.maximum(a, b)
    Out[11]:
    array([[23,  1,  0,  1, 23],
           [ 1,  0,  0, 23,  1],
           [23, 23,  0, 23,  1],
           [ 1,  1, 23,  0, 23]])
    
    In [12]: np.minimum(a, b)
    Out[12]:
    array([[ 0,  1,  0,  0,  1],
           [ 1,  0,  0,  0,  1],
           [ 0, 23,  0,  1,  1],
           [ 1,  1,  0,  0,  1]])
    

    If you need to do this for more than two arrays, you can initialize a 3D array and call .max(axis=0):

    In [15]: c
    Out[15]:
    array([[23,  0, 23,  1,  1],
           [23, 23,  1,  0, 23],
           [23,  1, 23,  1,  0],
           [ 0,  0,  0, 23,  0]])
    
    In [16]: d
    Out[16]:
    array([[23,  0,  0, 23, 23],
           [ 1,  0,  1,  0, 23],
           [ 0,  0, 23, 23,  0],
           [ 1,  0, 23, 23,  0]])
    
    
    In [17]: np.array([a, b, c, d]).max(axis=0)
    Out[17]:
    array([[23,  1, 23, 23, 23],
           [23, 23,  1, 23, 23],
           [23, 23, 23, 23,  1],
           [ 1,  1, 23, 23, 23]])