Search code examples
pythonarraysnumpycompound-assignment

vectorize the assignment of 3d numpy arrays conditioned on the associate values at other dimensions


Is it possible to vectorize the following code in Python? It runs very slowly when the size of the array becomes large.

import numpy as np

# A, B, C are 3d arrays with shape (K, N, N). 
# Entries in A, B, and C are in [0, 1]. 
# In the following, I use random values in B and C as an example.

K = 5
N = 10000
A = np.zeros((K, N, N))
B = np.random.normal(0, 1, (K, N, N))
C = np.random.normal(0, 1, (K, N, N))

for k in range(K):
    for m in [x for x in range(K) if x != k]:
        for i in range(N):
            for j in range(N):
                if A[m, i, j] not in [0, 1]:
                    if A[k, i, j] == 1:
                        A[m, i, j] = B[m ,i ,j]
                    if A[k ,i, j] == 0:
                        A[m, i, j] = C[m, i, j]


Solution

  • I can help with a partial vectorization that should speed things up quite a bit, but I'm not sure on your logic for k vs. m, so didn't try to include that part. Essentially, you create a mask with the conditions you want checked across the 2nd and 3rd dimensions of A. Then map between A and either B or C using the appropriate mask:

    # A, B, C are 3d arrays with shape (K, N, N). 
    # Entries in A, B, and C are in [0, 1]. 
    # In the following, I use random values in B and C as an example.
    
    np.random.seed(10)
    
    K = 5
    N = 1000
    A = np.zeros((K, N, N))
    B = np.random.normal(0, 1, (K, N, N))
    C = np.random.normal(0, 1, (K, N, N))
    
    for k in range(K):
        for m in [x for x in range(K) if x != k]:
            #if A[m, i, j] not in [0, 1]:
            mask_1 = A[k, :, :] == 1
            mask_0 = A[k, :, :] == 0
            A[m, mask_1] = B[m, mask_1]
            A[m, mask_0] = C[m, mask_0]
    

    I omitted the A[m, i, j] not in [0, 1] part because this made it difficult to debug since nothing happens (A is initialized as all zeros). If you need to include additional logic like this, just create another mask for it and include it in with an and in each mask's logic.

    Update on 7/6/22 If you want to update the above code to remove the loop over m, then you can initialize an array with all the values of k, and use that to expand the mask to include all 3 dimensions, excluding each value of k that matches m as follows:

    np.random.seed(10)
    
    K = 5
    N = 1000
    A_2 = np.zeros((K, N, N))
    B = np.random.normal(0, 1, (K, N, N))
    C = np.random.normal(0, 1, (K, N, N))
    K_vals = np.array(range(K))
    
    for k in range(K):
        #for m in [x for x in range(K) if x != k]:
            #if A[m, i, j] not in [0, 1]:
        k_dim_2_skip = K_vals == k
        mask_1 = np.tile(A_2[k, :, :] == 1, (K, 1, 1))
        mask_1[k_dim_2_skip, :, :] = False
        mask_0 = np.tile(A_2[k, :, :] == 0, (K, 1, 1))
        mask_0[k_dim_2_skip, :, :] = False
        A_2[mask_1] = B[mask_1]
        A_2[mask_0] = C[mask_0]
    

    Use these masks with the & np.logical_not... code you added in the comment below and that should do it. Note the more you vectorize, the larger the arrays you're manipulating for masks, etc. get, so there is a tradeoff with memory consumption. There is usually a sweet spot to balance run time vs memory usage for a given problem.