Search code examples
pythonnumpynetcdf

Assigning values to numpy array based on multiple conditions of multiple array


I have the ocean and atmospheric dataset in netcdf file. Ocean data will contain nan or any other value -999 over land area. For this eample, say it is nan. Sample data will look like this:-

import numpy as np
ocean = np.array([[2, 4, 5], [6, np.nan, 2], [9, 3, np.nan]])
atmos = np.array([[4, 2, 5], [6, 7, 3], [8, 3, 2]])

Now I wanted to apply multiple conditions on ocean and atmos data to make a new array which will have only values from 1 to 8. For example in ocean data, values between 2 and 4 will be assigned as 1 and values between 4 and 6 will be assigned as 2. The same comparison goes to atmos dataset as well.

To simplify the comparison and assignment operation, I made a list of bin values and used np.digitize to make categories.

bin1 = [2, 4, 6]
bin2 = [4, 6, 8]
ocean_cat = np.digitize(ocean, bin1)
atmos_cat = np.digitize(atmos, bin2) 

which produces the following result:-

[[1 2 2]
 [3 3 1]
 [3 1 3]]

[[1 0 1]
 [2 2 0]
 [3 0 0]]

Now I wanted element-wise maximum between the above two array results. Therefore, I used np.fmax to get the element-wise maximum.

final_cat = np.fmax(ocean_cat, atmos_cat)
print(final_cat)

which produces the below result:-

[[1 2 2]
 [3 3 1]
 [3 1 3]]

The above result is almost what I need. The only issue I find here is the missing nan value. What I wanted in the final result is:-

[[1 2 2]
 [3 nan 1]
 [3 1 nan]]

Can someone help me to replace the values with nan from the same index of original ocean array?


Solution

  • A simple option would be to mask the output with numpy.where:

    bin1 = [2, 4, 6]
    bin2 = [4, 6, 8]
    ocean_cat = np.digitize(ocean, bin1)
    atmos_cat = np.digitize(atmos, bin2) 
    final_cat = np.where(np.isnan(ocean), np.nan,
                         np.fmax(ocean_cat, atmos_cat))
    

    If both arrays can have NaNs:

    final_cat = np.where(np.isnan(ocean)|np.isnan(atmos),
                         np.nan,
                         np.fmax(ocean_cat, atmos_cat))
    

    Or np.isnan(ocean)&np.isnan(atmos) if you only want a NaN when both inputs are NaN.

    Output:

    array([[ 1.,  2.,  2.],
           [ 3., nan,  1.],
           [ 3.,  1., nan]])
    

    Generic approach for any number of input arrays:

    arrays = [ocean, atmos]
    bins = [bin1, bin2]
    
    out = np.where(np.logical_or.reduce([np.isnan(a) for a in arrays]),
                   np.nan,
                   np.fmax.reduce([np.digitize(a, b) for a,b in zip(arrays, bins)])
                   )