Search code examples
pythonarraysnumpycoalesce

Python Numpy: Coalesce and return first nonzero observation


I am currently new to NumPy, but very proficient with SQL. I used a function called coalesce in SQL, which I was disappointed not to find in NumPy. I need this function to create a third array want from 2 arrays i.e. array1 and array2, where zero/ missing observations in array1 are replaced by observations in array2 under the same address/Location. I can't figure out how to use np.where?

Once this task is accomplished, I would like to take the lower diagonal of this array want and then populate a final array want2 noting the first non-zero observation. If all observations i.e. coalesce(array1, array2) returns missing or 0 in want2, then assign by default zero.

I have written an example demonstrating the desired behavior.

import numpy as np
array1= np.array(([-10,0,20],[-1,0,0],[0,34,-50]))
array2= np.array(([10,10,50],[10,0,25],[50,45,0]))

# Coalesce array1,array2 i.e. get the first non-zero value from array1, then from array2. 
# if array1 is empty or zero, then populate table want with values from array2 under same address
want=np.tril(np.array(([-10,10,20],[-1,0,25],[50,34,-50])))

print(array1)
print(array2)
print(want)

# print first instance of nonzero observation from each column of table want
want2=np.array([-10,34,-50])
print(want2)

Solution

  • "Coalesce": use putmask to replace values equal to zero with values from array2:

    want = array1.copy()
    np.putmask(array1.copy(), array1==0, array2)
    

    First nonzero element of each column of np.tril(want):

    where_nonzero = np.where(np.tril(want) != 0)
    
    """For the where array, get the indices of only 
    the first index for each column"""
    first_indices = np.unique(where_nonzero[1], return_index=True)[1]
    
    # Get the values from want for those indices
    want2 = want[(where_nonzero[0][first_indices], where_nonzero[1][first_indices])]