python tensorflow data-retrieval embedding-lookup

Why is the branched output of Tensorflow model yielding only from 1 branch?

I'm a Tensorflow beginner and I'm trying to reproduce the TF Classification by Retrieval model as explained here using Python since the blog provides the code in C++.

The model architecture seems to successfully reproduced as shown in the model architecture. I used for loop and tf.nn.embedding_lookup() to create "branch" for each class to be aggregated (tf.reduce_max) and concatenated for the last output layer. The problem is the output always yields for 1 class only.

Here is my code,

input = Input([None, None, 3], dtype=tf.uint8)
preprocess_layer = tf.cast(input, tf.float32)
preprocess_layer = tf.keras.applications.mobilenet.preprocess_input(preprocess_layer)

x = MobNetSmall(preprocess_layer)
x = Flatten()(x)

x = Lambda(lambda x: tf.nn.l2_normalize(x), name='l2_norm_layer')(x)
retrieval_output = Dense(
        num_instances,
        kernel_initializer=weights_matrix,
        activation="linear",
        trainable=False,
        name='retrieval_layer')(x)

labels = [fn.split('-')[0]+'-'+fn.split('-')[1] for fn in filenames]
class_id = set(labels)
selection_layer_output = list()

for ci in class_id:
    class_index = [i for i, x in enumerate(labels) if x == ci]
    class_index = tf.cast(class_index, tf.int32)
    x = Lambda(lambda x: tf.nn.embedding_lookup(x[0], class_index), name=f'{ci}_selection_layer')(retrieval_output)
    x = Lambda(lambda x: tf.reduce_max(x), name=f'{ci}_aggregate_max')(x)
    selection_layer_output.append(x)

concatenated_ouput = tf.stack(selection_layer_output, axis=0)

model = Model(inputs=preprocess_layer, outputs=concatenated_ouput)
model.summary()

And here is the output when I try to predict test image,

root = tk.Tk()
root.update()
filename = askopenfilename(filetypes=[("images", ["*.jpg", "*.jpeg", "*.png"])])
img = cv2.imread(filename)
root.destroy()

query_imgarr = preprocess_img(img)
model_output = model.predict(query_imgarr)
model_output

>>> array([0.92890763, 0.92890763, 0.92890763, 0.92890763, 0.92890763],
      dtype=float32)

When I tried to do the embedding lookup and aggregation separately, the output is correct. As seen below, the model only yields the 4th (from above) class only.

labels = [fn.split('-')[0]+'-'+fn.split('-')[1] for fn in filenames]
class_id = set(labels)

for ci in class_id:
    class_index = [i for i, x in enumerate(labels) if x == ci]
    class_predictions = tf.nn.embedding_lookup(model_output[0], class_index)
    output_ = tf.reduce_max(class_predictions)
    print(output_)

>>> tf.Tensor(0.49454707, shape=(), dtype=float32)
>>> tf.Tensor(0.6946863, shape=(), dtype=float32)
>>> tf.Tensor(0.62603784, shape=(), dtype=float32)
>>> tf.Tensor(0.92890763, shape=(), dtype=float32)
>>> tf.Tensor(0.59326285, shape=(), dtype=float32)

Any help would be appreciated, thanks!

Solution

So after looking around, referring to this thread, the "correct" way to use TF operation (in my case is tf.nn.embedding_lookup and tf.reduce_max) is by wrapping them in a Layer subclass, or by making a custom layer.

class AggregationLayer(tf.keras.layers.Layer):
    def __init__(self, class_index):
        self.class_index = class_index
        super(AggregationLayer, self).__init__()
    
    def call(self, inputs, **kwargs):
        x = tf.nn.embedding_lookup(inputs[0], self.class_index)
        x = tf.reduce_max(x)
        return x

This solution solves my problem.