Search code examples
tensorflowkerasdeep-learningconv-neural-networkresnet

Can't get multi-output CNN to work (tensorflow and keras)


I'm currently working on a task of fiber tip tracking on an endoscopic video. For this purpose I have two models:

  • classifier that tells whether image contains fiber (is_visible)
  • regressor that predicts fiber tip position (x, y)

I am using ResNet18 pretrained on ImageNet for this purpose and it works great. But I'm experiencing performance issues, so I decided to combine these two models into a single one using multi-output approach. But so far I haven't been able to get it to work.

TENSORFLOW:

TensorFlow version: 2.10.1

DATATSET:

My dataset is stored in a HDF5 format. Each sample has:

  • an image (224, 224, 3)
  • uint8 for visibility flag
  • and two floats for fiber tip position (x, y)

I am loading this dataset using custom generator as follows:

output_types = (tf.float32, tf.uint8, tf.float32)
output_shapes = (
    tf.TensorShape((None, image_height, image_width, number_of_channels)),  # image
    tf.TensorShape((None, 1)),                                              # is_visible
    tf.TensorShape((None, 1, 1, 2)),                                        # x, y
)

train_dataset = tf.data.Dataset.from_generator(
    generator, output_types=output_types, output_shapes=output_shapes,
)

MODEL:

My model is defined as follows:

model = ResNet18(input_shape=(224, 224, 3), weights="imagenet", include_top=False)
inputLayer = model.input
innerLayer = tf.keras.layers.Flatten()(model.output)

is_visible = tf.keras.layers.Dense(1, activation="sigmoid", name="is_visible")(innerLayer)

position = tf.keras.layers.Dense(2)(innerLayer)
position = tf.keras.layers.Reshape((1, 1, 2), name="position")(position)

model = tf.keras.Model(inputs=[inputLayer], outputs=[is_visible, position])
adam = tf.keras.optimizers.Adam(1e-4)
model.compile(
    optimizer=adam,
    loss={
        "is_visible": "binary_crossentropy",
        "position": "mean_squared_error",
    },
    loss_weights={
        "is_visible": 1.0,
        "position": 1.0
    },
    metrics={
        "is_visible": "accuracy",
        "position": "mean_squared_error"
    },
)

PROBLEM:

Dataset is working great, I can loop through each batch. But when it comes to training

model.fit(
    train_dataset,
    validation_data=validation_dataset,
    epochs=100000,
    callbacks=callbacks,
)
  1. I get the following error

ValueError: Can not squeeze dim[3], expected a dimension of 1, got 2 for '{{node mean_squared_error/weighted_loss/Squeeze}} = SqueezeT=DT_FLOAT, squeeze_dims=[-1]' with input shapes: [?,1,1,2].

  1. I tried to change the dataset format like so:
output_types = (tf.float32, tf.uint8, tf.float32, tf.float32)
output_shapes = (
    tf.TensorShape((None, image_height, image_width, number_of_channels)),  # image
    tf.TensorShape((None, 1)),                                              # is_visible
    tf.TensorShape((None, 1)),                                              # x
    tf.TensorShape((None, 1)),                                              # y
)

But these leads to another error:

ValueError: Data is expected to be in format x, (x,), (x, y), or (x, y, sample_weight), found: (<tf.Tensor 'IteratorGetNext:0' shape=(None, 224, 224, 3) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(None, 1) dtype=uint8>, <tf.Tensor 'IteratorGetNext:2' shape=(None, 1) dtype=float32>, <tf.Tensor 'IteratorGetNext:3' shape=(None, 1) dtype=float32>)

I tried to wrap is_visible and (x,y) returned from train_dataset into dictionary like so:

yield image_batch, {"is_visible": is_visible_batch, "position": position_batch}

Also tried these options:

yield image_batch, (is_visible_batch, position_batch)
yield image_batch, [is_visible_batch, position_batch]

But that didn't help

Can anyone tell me what am I doing wrong? I am totally stuck ))


Solution

  • Answering my own question. I was able to make this thing work.

    I have modified my code a little (removed redundant dimensions), but don't let that distract you. Below you will find exact modifications that solved the problem. My dataset now looks as follow:

    output_types = (
        tf.float32,
        (
            tf.float32,
            tf.uint8
        )
    )
    output_shapes = (
        tf.TensorShape((256, 256, 3)),
        (
            tf.TensorShape((2)),
            tf.TensorShape((1)),
        )
    )
    train_dataset = tf.data.Dataset.from_generator(
        generator, output_types=output_types, output_shapes=output_shapes,
    )
    

    Note that dataset tensor consists of two types:

    • first type is a tensor for input images
    • second type is a tuple for multiple output targets (visibility flag and pixel coordinates)

    If you have multiple targets you need to wrap them into tuple like so:

    output_types = (
        #input, in my case I have a single input image
        tf.float32,     #image type
    
        #output, in my case I have a multioutput dataset (and multioutput model),
        #so we need to wrap target types into tuple
        (
            tf.float32, #type for regression task - fiber tip position (pixel coordinates in range [0; 1])
            tf.uint8    #type for classification task - is fiber visible (0 - not visible, 1 - visible)
        )
    )
    

    And similarly we have our dataset shape

    output_shapes = (
        #single tensor for input image
        tf.TensorShape((256, 256, 3)),
    
        #tuple of tensors for multiple output
        (
            tf.TensorShape((2)),    #two coordinates for x, y position
            tf.TensorShape((1)),    #single value for classification task (visibility flag)
        )
    )
    

    An here is my model once again

    #I've changed my DNN architecture to VGG16, but this is not the case, it should work for any network
    #as soon the model types are configured properly
    model = tf.keras.applications.VGG16(input_shape=(224, 224, 3), weights="imagenet", include_top=False)
    
    model.trainable = False
    for layer in model.layers[-3:]:
        layer.trainable = True
    
    inputLayer = model.input
    
    hiddenLayers = tf.keras.layers.Flatten(name="flatten")(model.output)
    
    position = tf.keras.layers.Dense(2, activation="sigmoid", name="position")(hiddenLayers)
    is_visible = tf.keras.layers.Dense(1, activation="sigmoid", name="is_visible")(hiddenLayers)
    
    model = tf.keras.Model(inputs=[inputLayer], outputs=[position, is_visible])
    adam = tf.keras.optimizers.Adam(1e-4)
    model.compile(
        optimizer=adam,
        loss={
            "position": "mean_squared_error",
            "is_visible": "binary_crossentropy"
        },
        loss_weights={
            "position": 1.0,
            "is_visible": 1.0
        },
        metrics={
            "position": "mean_squared_error",
            "is_visible": "accuracy"
        }
    )
    

    And finally calling fit method in order to train:

    model.fit(
    train_dataset,
    validation_data=validation_dataset,
    epochs=100000,
    callbacks=callbacks,
    

    )

    Hope this helps some newbies who will follow the same path