tensorflow keras deep-learning conv-neural-network resnet

Can't get multi-output CNN to work (tensorflow and keras)

I'm currently working on a task of fiber tip tracking on an endoscopic video. For this purpose I have two models:

classifier that tells whether image contains fiber (is_visible)
regressor that predicts fiber tip position (x, y)

I am using ResNet18 pretrained on ImageNet for this purpose and it works great. But I'm experiencing performance issues, so I decided to combine these two models into a single one using multi-output approach. But so far I haven't been able to get it to work.

TENSORFLOW:

TensorFlow version: 2.10.1

DATATSET:

My dataset is stored in a HDF5 format. Each sample has:

an image (224, 224, 3)
uint8 for visibility flag
and two floats for fiber tip position (x, y)

I am loading this dataset using custom generator as follows:

output_types = (tf.float32, tf.uint8, tf.float32)
output_shapes = (
    tf.TensorShape((None, image_height, image_width, number_of_channels)),  # image
    tf.TensorShape((None, 1)),                                              # is_visible
    tf.TensorShape((None, 1, 1, 2)),                                        # x, y
)

train_dataset = tf.data.Dataset.from_generator(
    generator, output_types=output_types, output_shapes=output_shapes,
)

MODEL:

My model is defined as follows:

model = ResNet18(input_shape=(224, 224, 3), weights="imagenet", include_top=False)
inputLayer = model.input
innerLayer = tf.keras.layers.Flatten()(model.output)

is_visible = tf.keras.layers.Dense(1, activation="sigmoid", name="is_visible")(innerLayer)

position = tf.keras.layers.Dense(2)(innerLayer)
position = tf.keras.layers.Reshape((1, 1, 2), name="position")(position)

model = tf.keras.Model(inputs=[inputLayer], outputs=[is_visible, position])
adam = tf.keras.optimizers.Adam(1e-4)
model.compile(
    optimizer=adam,
    loss={
        "is_visible": "binary_crossentropy",
        "position": "mean_squared_error",
    },
    loss_weights={
        "is_visible": 1.0,
        "position": 1.0
    },
    metrics={
        "is_visible": "accuracy",
        "position": "mean_squared_error"
    },
)

PROBLEM:

Dataset is working great, I can loop through each batch. But when it comes to training

model.fit(
    train_dataset,
    validation_data=validation_dataset,
    epochs=100000,
    callbacks=callbacks,
)

I get the following error

ValueError: Can not squeeze dim[3], expected a dimension of 1, got 2 for '{{node mean_squared_error/weighted_loss/Squeeze}} = SqueezeT=DT_FLOAT, squeeze_dims=[-1]' with input shapes: [?,1,1,2].

I tried to change the dataset format like so:

output_types = (tf.float32, tf.uint8, tf.float32, tf.float32)
output_shapes = (
    tf.TensorShape((None, image_height, image_width, number_of_channels)),  # image
    tf.TensorShape((None, 1)),                                              # is_visible
    tf.TensorShape((None, 1)),                                              # x
    tf.TensorShape((None, 1)),                                              # y
)

But these leads to another error:

ValueError: Data is expected to be in format x, (x,), (x, y), or (x, y, sample_weight), found: (<tf.Tensor 'IteratorGetNext:0' shape=(None, 224, 224, 3) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(None, 1) dtype=uint8>, <tf.Tensor 'IteratorGetNext:2' shape=(None, 1) dtype=float32>, <tf.Tensor 'IteratorGetNext:3' shape=(None, 1) dtype=float32>)

I tried to wrap is_visible and (x,y) returned from train_dataset into dictionary like so:

yield image_batch, {"is_visible": is_visible_batch, "position": position_batch}

Also tried these options:

yield image_batch, (is_visible_batch, position_batch)
yield image_batch, [is_visible_batch, position_batch]

But that didn't help

Can anyone tell me what am I doing wrong? I am totally stuck ))

Solution

Answering my own question. I was able to make this thing work.

I have modified my code a little (removed redundant dimensions), but don't let that distract you. Below you will find exact modifications that solved the problem. My dataset now looks as follow:

output_types = (
    tf.float32,
    (
        tf.float32,
        tf.uint8
    )
)
output_shapes = (
    tf.TensorShape((256, 256, 3)),
    (
        tf.TensorShape((2)),
        tf.TensorShape((1)),
    )
)
train_dataset = tf.data.Dataset.from_generator(
    generator, output_types=output_types, output_shapes=output_shapes,
)

Note that dataset tensor consists of two types:

first type is a tensor for input images
second type is a tuple for multiple output targets (visibility flag and pixel coordinates)

If you have multiple targets you need to wrap them into tuple like so:

output_types = (
    #input, in my case I have a single input image
    tf.float32,     #image type

    #output, in my case I have a multioutput dataset (and multioutput model),
    #so we need to wrap target types into tuple
    (
        tf.float32, #type for regression task - fiber tip position (pixel coordinates in range [0; 1])
        tf.uint8    #type for classification task - is fiber visible (0 - not visible, 1 - visible)
    )
)

And similarly we have our dataset shape

output_shapes = (
    #single tensor for input image
    tf.TensorShape((256, 256, 3)),

    #tuple of tensors for multiple output
    (
        tf.TensorShape((2)),    #two coordinates for x, y position
        tf.TensorShape((1)),    #single value for classification task (visibility flag)
    )
)

An here is my model once again

#I've changed my DNN architecture to VGG16, but this is not the case, it should work for any network
#as soon the model types are configured properly
model = tf.keras.applications.VGG16(input_shape=(224, 224, 3), weights="imagenet", include_top=False)

model.trainable = False
for layer in model.layers[-3:]:
    layer.trainable = True

inputLayer = model.input

hiddenLayers = tf.keras.layers.Flatten(name="flatten")(model.output)

position = tf.keras.layers.Dense(2, activation="sigmoid", name="position")(hiddenLayers)
is_visible = tf.keras.layers.Dense(1, activation="sigmoid", name="is_visible")(hiddenLayers)

model = tf.keras.Model(inputs=[inputLayer], outputs=[position, is_visible])
adam = tf.keras.optimizers.Adam(1e-4)
model.compile(
    optimizer=adam,
    loss={
        "position": "mean_squared_error",
        "is_visible": "binary_crossentropy"
    },
    loss_weights={
        "position": 1.0,
        "is_visible": 1.0
    },
    metrics={
        "position": "mean_squared_error",
        "is_visible": "accuracy"
    }
)

And finally calling fit method in order to train:

model.fit(
train_dataset,
validation_data=validation_dataset,
epochs=100000,
callbacks=callbacks,

)

Hope this helps some newbies who will follow the same path