I'm currently working on a task of fiber tip tracking on an endoscopic video. For this purpose I have two models:
I am using ResNet18 pretrained on ImageNet for this purpose and it works great. But I'm experiencing performance issues, so I decided to combine these two models into a single one using multi-output approach. But so far I haven't been able to get it to work.
TENSORFLOW:
TensorFlow version: 2.10.1
DATATSET:
My dataset is stored in a HDF5 format. Each sample has:
I am loading this dataset using custom generator as follows:
output_types = (tf.float32, tf.uint8, tf.float32)
output_shapes = (
tf.TensorShape((None, image_height, image_width, number_of_channels)), # image
tf.TensorShape((None, 1)), # is_visible
tf.TensorShape((None, 1, 1, 2)), # x, y
)
train_dataset = tf.data.Dataset.from_generator(
generator, output_types=output_types, output_shapes=output_shapes,
)
MODEL:
My model is defined as follows:
model = ResNet18(input_shape=(224, 224, 3), weights="imagenet", include_top=False)
inputLayer = model.input
innerLayer = tf.keras.layers.Flatten()(model.output)
is_visible = tf.keras.layers.Dense(1, activation="sigmoid", name="is_visible")(innerLayer)
position = tf.keras.layers.Dense(2)(innerLayer)
position = tf.keras.layers.Reshape((1, 1, 2), name="position")(position)
model = tf.keras.Model(inputs=[inputLayer], outputs=[is_visible, position])
adam = tf.keras.optimizers.Adam(1e-4)
model.compile(
optimizer=adam,
loss={
"is_visible": "binary_crossentropy",
"position": "mean_squared_error",
},
loss_weights={
"is_visible": 1.0,
"position": 1.0
},
metrics={
"is_visible": "accuracy",
"position": "mean_squared_error"
},
)
PROBLEM:
Dataset is working great, I can loop through each batch. But when it comes to training
model.fit(
train_dataset,
validation_data=validation_dataset,
epochs=100000,
callbacks=callbacks,
)
ValueError: Can not squeeze dim[3], expected a dimension of 1, got 2 for '{{node mean_squared_error/weighted_loss/Squeeze}} = SqueezeT=DT_FLOAT, squeeze_dims=[-1]' with input shapes: [?,1,1,2].
output_types = (tf.float32, tf.uint8, tf.float32, tf.float32)
output_shapes = (
tf.TensorShape((None, image_height, image_width, number_of_channels)), # image
tf.TensorShape((None, 1)), # is_visible
tf.TensorShape((None, 1)), # x
tf.TensorShape((None, 1)), # y
)
But these leads to another error:
ValueError: Data is expected to be in format x
, (x,)
, (x, y)
, or (x, y, sample_weight)
, found: (<tf.Tensor 'IteratorGetNext:0' shape=(None, 224, 224, 3) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(None, 1) dtype=uint8>, <tf.Tensor 'IteratorGetNext:2' shape=(None, 1) dtype=float32>, <tf.Tensor 'IteratorGetNext:3' shape=(None, 1) dtype=float32>)
I tried to wrap is_visible and (x,y) returned from train_dataset into dictionary like so:
yield image_batch, {"is_visible": is_visible_batch, "position": position_batch}
Also tried these options:
yield image_batch, (is_visible_batch, position_batch)
yield image_batch, [is_visible_batch, position_batch]
But that didn't help
Can anyone tell me what am I doing wrong? I am totally stuck ))
Answering my own question. I was able to make this thing work.
I have modified my code a little (removed redundant dimensions), but don't let that distract you. Below you will find exact modifications that solved the problem. My dataset now looks as follow:
output_types = (
tf.float32,
(
tf.float32,
tf.uint8
)
)
output_shapes = (
tf.TensorShape((256, 256, 3)),
(
tf.TensorShape((2)),
tf.TensorShape((1)),
)
)
train_dataset = tf.data.Dataset.from_generator(
generator, output_types=output_types, output_shapes=output_shapes,
)
Note that dataset tensor consists of two types:
If you have multiple targets you need to wrap them into tuple like so:
output_types = (
#input, in my case I have a single input image
tf.float32, #image type
#output, in my case I have a multioutput dataset (and multioutput model),
#so we need to wrap target types into tuple
(
tf.float32, #type for regression task - fiber tip position (pixel coordinates in range [0; 1])
tf.uint8 #type for classification task - is fiber visible (0 - not visible, 1 - visible)
)
)
And similarly we have our dataset shape
output_shapes = (
#single tensor for input image
tf.TensorShape((256, 256, 3)),
#tuple of tensors for multiple output
(
tf.TensorShape((2)), #two coordinates for x, y position
tf.TensorShape((1)), #single value for classification task (visibility flag)
)
)
An here is my model once again
#I've changed my DNN architecture to VGG16, but this is not the case, it should work for any network
#as soon the model types are configured properly
model = tf.keras.applications.VGG16(input_shape=(224, 224, 3), weights="imagenet", include_top=False)
model.trainable = False
for layer in model.layers[-3:]:
layer.trainable = True
inputLayer = model.input
hiddenLayers = tf.keras.layers.Flatten(name="flatten")(model.output)
position = tf.keras.layers.Dense(2, activation="sigmoid", name="position")(hiddenLayers)
is_visible = tf.keras.layers.Dense(1, activation="sigmoid", name="is_visible")(hiddenLayers)
model = tf.keras.Model(inputs=[inputLayer], outputs=[position, is_visible])
adam = tf.keras.optimizers.Adam(1e-4)
model.compile(
optimizer=adam,
loss={
"position": "mean_squared_error",
"is_visible": "binary_crossentropy"
},
loss_weights={
"position": 1.0,
"is_visible": 1.0
},
metrics={
"position": "mean_squared_error",
"is_visible": "accuracy"
}
)
And finally calling fit method in order to train:
model.fit(
train_dataset,
validation_data=validation_dataset,
epochs=100000,
callbacks=callbacks,
)
Hope this helps some newbies who will follow the same path