Search code examples
pythonarraysnumpymatrixvectorization

Vectorized method to match and compare elements of two matrices


I have two matrices that contain only 1's and 0's:
A, shape n x m.
B, shape n x o.

Conceptually, the "n" rows represent facilities that contain products "m" and serve customer locations "o".

I want to count each combination of customer "o" and product "m" that is served by at least one facility "n".

This is not as easy as doing simple sums, since I am not adding up the entire data structure. I am instead adding up the number of times each "o" and "m" combination has at least facility (not total facilities).

I have successfully done this by looping through columns of B, broadcasting a multiply, summing the rows, and then summing a conditional check of whether each row's sum was > 0.

Yet, loops are bad and this feels like the kind of thing numpy can do in a vectorized way. I've looked into np.stack and np.meshgrid. Neither seemed to work for me.

Any smart ideas on if / how to vectorize this?

Here is my looped solution. The correct answer is 145.

import numpy as np

A = np.array(
    [[1., 0., 1., 1., 1., 0., 1., 1., 1., 1.],
    [0., 0., 1., 1., 0., 1., 1., 0., 0., 1.],
    [0., 1., 1., 0., 0., 1., 0., 0., 1., 1.],
    [0., 0., 0., 0., 0., 1., 0., 1., 0., 0.],
    [0., 0., 1., 1., 0., 0., 0., 0., 0., 1.]]
)

print(A.shape)
A

B = np.array(
    [[0., 1., 0., 0., 1., 0., 1., 1., 0., 1., 0., 1., 1., 0., 1., 0., 0., 0., 1., 1.],
    [1., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 1., 1.],
    [0., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 1., 1., 0., 0., 1., 0., 0.],
    [0., 1., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 0., 1., 0., 0., 0.],
    [1., 1., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 1., 0., 1., 0., 1., 1., 0., 1.]]
)

print(B.shape)
B

def count_conditions(A: np.ndarray, B: np.ndarray):
    # Inputs are all 0 or 1.
    assert np.all((A == 0) | (A == 1))
    assert np.all((B == 0) | (B == 1))
    
    running_total = 0
    
    for col in range(B.shape[1]):
        # Select column and broadcast multiply.
        this_broadcast = A.T * B[:,col]
        # Sum each row.
        row_sums = np.sum(this_broadcast, axis=1)
        # Count how many rows had at least one.
        running_total += np.sum(row_sums > 0)
    
    return running_total

result = count_conditions(A=A, B=B)

print(result)
assert result == 145

Solution

  • I think you want matrix multiplication:

    out = (A.T @ B).astype(bool).sum()
    

    or equivalent:

    out = (A.T @ B > 0).sum()
    

    You can also convert A,B to boolean type:

    out = (A.astype(bool).T @ B.astype(bool)).sum()
    

    All give out==145