I have two matrices that contain only 1's and 0's:
A, shape n x m.
B, shape n x o.
Conceptually, the "n" rows represent facilities that contain products "m" and serve customer locations "o".
I want to count each combination of customer "o" and product "m" that is served by at least one facility "n".
This is not as easy as doing simple sums, since I am not adding up the entire data structure. I am instead adding up the number of times each "o" and "m" combination has at least facility (not total facilities).
I have successfully done this by looping through columns of B, broadcasting a multiply, summing the rows, and then summing a conditional check of whether each row's sum was > 0.
Yet, loops are bad and this feels like the kind of thing numpy can do in a vectorized way. I've looked into np.stack and np.meshgrid. Neither seemed to work for me.
Any smart ideas on if / how to vectorize this?
Here is my looped solution. The correct answer is 145.
import numpy as np
A = np.array(
[[1., 0., 1., 1., 1., 0., 1., 1., 1., 1.],
[0., 0., 1., 1., 0., 1., 1., 0., 0., 1.],
[0., 1., 1., 0., 0., 1., 0., 0., 1., 1.],
[0., 0., 0., 0., 0., 1., 0., 1., 0., 0.],
[0., 0., 1., 1., 0., 0., 0., 0., 0., 1.]]
)
print(A.shape)
A
B = np.array(
[[0., 1., 0., 0., 1., 0., 1., 1., 0., 1., 0., 1., 1., 0., 1., 0., 0., 0., 1., 1.],
[1., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 1., 1.],
[0., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 1., 1., 0., 0., 1., 0., 0.],
[0., 1., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 0., 1., 0., 0., 0.],
[1., 1., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 1., 0., 1., 0., 1., 1., 0., 1.]]
)
print(B.shape)
B
def count_conditions(A: np.ndarray, B: np.ndarray):
# Inputs are all 0 or 1.
assert np.all((A == 0) | (A == 1))
assert np.all((B == 0) | (B == 1))
running_total = 0
for col in range(B.shape[1]):
# Select column and broadcast multiply.
this_broadcast = A.T * B[:,col]
# Sum each row.
row_sums = np.sum(this_broadcast, axis=1)
# Count how many rows had at least one.
running_total += np.sum(row_sums > 0)
return running_total
result = count_conditions(A=A, B=B)
print(result)
assert result == 145
I think you want matrix multiplication:
out = (A.T @ B).astype(bool).sum()
or equivalent:
out = (A.T @ B > 0).sum()
You can also convert A,B
to boolean type:
out = (A.astype(bool).T @ B.astype(bool)).sum()
All give out==145