I'm a Tensorflow beginner and I'm trying to reproduce the TF Classification by Retrieval model as explained here using Python since the blog provides the code in C++.
The model architecture seems to successfully reproduced as shown in the model architecture. I used for loop and tf.nn.embedding_lookup()
to create "branch" for each class to be aggregated (tf.reduce_max
) and concatenated for the last output layer. The problem is the output always yields for 1 class only.
Here is my code,
input = Input([None, None, 3], dtype=tf.uint8)
preprocess_layer = tf.cast(input, tf.float32)
preprocess_layer = tf.keras.applications.mobilenet.preprocess_input(preprocess_layer)
x = MobNetSmall(preprocess_layer)
x = Flatten()(x)
x = Lambda(lambda x: tf.nn.l2_normalize(x), name='l2_norm_layer')(x)
retrieval_output = Dense(
num_instances,
kernel_initializer=weights_matrix,
activation="linear",
trainable=False,
name='retrieval_layer')(x)
labels = [fn.split('-')[0]+'-'+fn.split('-')[1] for fn in filenames]
class_id = set(labels)
selection_layer_output = list()
for ci in class_id:
class_index = [i for i, x in enumerate(labels) if x == ci]
class_index = tf.cast(class_index, tf.int32)
x = Lambda(lambda x: tf.nn.embedding_lookup(x[0], class_index), name=f'{ci}_selection_layer')(retrieval_output)
x = Lambda(lambda x: tf.reduce_max(x), name=f'{ci}_aggregate_max')(x)
selection_layer_output.append(x)
concatenated_ouput = tf.stack(selection_layer_output, axis=0)
model = Model(inputs=preprocess_layer, outputs=concatenated_ouput)
model.summary()
And here is the output when I try to predict test image,
root = tk.Tk()
root.update()
filename = askopenfilename(filetypes=[("images", ["*.jpg", "*.jpeg", "*.png"])])
img = cv2.imread(filename)
root.destroy()
query_imgarr = preprocess_img(img)
model_output = model.predict(query_imgarr)
model_output
>>> array([0.92890763, 0.92890763, 0.92890763, 0.92890763, 0.92890763],
dtype=float32)
When I tried to do the embedding lookup and aggregation separately, the output is correct. As seen below, the model only yields the 4th (from above) class only.
labels = [fn.split('-')[0]+'-'+fn.split('-')[1] for fn in filenames]
class_id = set(labels)
for ci in class_id:
class_index = [i for i, x in enumerate(labels) if x == ci]
class_predictions = tf.nn.embedding_lookup(model_output[0], class_index)
output_ = tf.reduce_max(class_predictions)
print(output_)
>>> tf.Tensor(0.49454707, shape=(), dtype=float32)
>>> tf.Tensor(0.6946863, shape=(), dtype=float32)
>>> tf.Tensor(0.62603784, shape=(), dtype=float32)
>>> tf.Tensor(0.92890763, shape=(), dtype=float32)
>>> tf.Tensor(0.59326285, shape=(), dtype=float32)
Any help would be appreciated, thanks!
So after looking around, referring to this thread, the "correct" way to use TF operation (in my case is tf.nn.embedding_lookup
and tf.reduce_max
) is by wrapping them in a Layer subclass, or by making a custom layer.
class AggregationLayer(tf.keras.layers.Layer):
def __init__(self, class_index):
self.class_index = class_index
super(AggregationLayer, self).__init__()
def call(self, inputs, **kwargs):
x = tf.nn.embedding_lookup(inputs[0], self.class_index)
x = tf.reduce_max(x)
return x
This solution solves my problem.