python numpy machine-learning loss-function

Perfect numpy implementation for this function

This seems more of a direct question. I will generalize it a bit at the end.

I am trying to this function in numpy. I have been successful using nested for loops but I can't think of a numpy way to do it.

My way of implementation:

bs = 10 # batch_size
nb = 8 # number of bounding boxes
nc = 15 # number of classes

bbox = np.random.random(size=(bs, nb, 4)) # model output bounding boxes

p = np.random.random(size=(bs, nb, nc)) # model output probability
p = softmax(p, axis=-1)

s_rand = np.random.random(size=(nc, nc))
s = (s_rand + s_rand.T)/2 # similarity matrix

pp = np.random.random(size=(bs, nb, nc)) # proposed probability
pp = softmax(pp, axis=-1)

first_term = 0
for b in range(nb):
    for b_1 in range(nb):
        if b_1 == b:
            continue
        for l in range(nc):
            for l_1 in range(nc):
                first_term += (s[l, l_1] * (pp[:, b, l] - pp[:, b_1, l_1])**2)
second_term = 0
for b in range(nb):
    for l in range(nc):
        second_term += (np.linalg.norm(s[l, :], ord=1) * (pp[:, b, l] - p[:, b, l])**2)
second_term *= nb

epsilon = 0.5
output = ((1 - epsilon) * first_term) + (epsilon * second_term)

I have tried hard to remove the loops and use np.tile and np.repeat instead, in order to achieve the task. But can't think of a possible way.

I have tried searching google for finding exercises like such which can help me learn such conversions in numpy but wasn't successful.

Solution

Maximally optimized code: (removal of first two loops is inspired from L.Iridium's answer)

squared_diff = (pp[:, :, None, :, None] - pp[:, None, :, None, :]) ** 2
weighted_diff = s * squared_diff
b_eq_b_1_removed = b.sum(axis=(3,4)) * (1 - np.eye(nb))
first_term = b_eq_b_1_removed.sum(axis=(1,2))

normalized_s = np.linalg.norm(s, ord=1, axis=1)
squared_diff = (pp - p)**2
second_term = nb * (normalized_s * squared_diff).sum(axis=(1,2))

loss = ((1 - epsilon) * first_term) + (epsilon * second_term)

Timeit track: 512 µs ± 13 µs per loop

Timeit track of code posted in question: 62.5 ms ± 197 µs per loop

That's a huge improvement.