Search code examples

Combine two different shaped inputs in Tensorflow, combine images and landmark coordinates

I am currently getting frustrated trying to combine two different shapes input layers that I want to give my model as input.

What I have:

I have the following two inputs

X_train # shape (120, 224, 224, 1)
landmarks_x_train # shape (120, 478, 3)
X_val # shape (40, 224, 224, 1)
landmarks_x_val # shape (40, 478, 3)

So in this example I have 120 images that are grayscale and have a size of (224, 224) and they all have one landmark “set” with 478 landmarks that have x, y, z coordinates.

The number 120 is just an example, the real dataset has way more images and landmarks for each image.

As a model, I have built a ResNet50 by myself with input_shape=(224, 224, 1).

And the output of x = Dense(7, activation='softmax')(x)

Before I train the model, I create a ImageDataGenerator flow like:

datagen = ImageDataGenerator(horizontal_flip=True, fill_mode='nearest')

batch_size = 16
train_flow = datagen.flow(X_train, y_train, batch_size=batch_size)
val_flow = datagen.flow(X_val, y_val, batch_size=batch_size)

My training steps are like:

model = ResNet.get_resnet_50_model() # my class where the model is located

optimizer = Adam(learning_rate=0.01)

model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

num_epochs = 5

history =,
                    steps_per_epoch=len(X_train) / batch_size,
                    validation_steps=len(X_val) / batch_size)

Where the problem is:

I now wanted to combine those two inputs that I have to build a better model that doesn't just rely on the images like it does now.

I have tried several things I found on the web and also asked ChatGPT but without luck.

The most promising way was two combine those two with a Keras Concatenate layer, like this:

model = ResNet.get_resnet_50_model()

landmarks_input = Input(shape=(landmarks_x_train.shape[1],), name='landmarks_input')

model_output = model.output

combined_input = concatenate([model_output, landmarks_input], name='combined_input')

model = Model(inputs=[model.input, landmarks_input], outputs=combined_input)

This gave me a model, but I was unable to adapt the process to get it running.


So now I hope someone can help me combine those two inputs, so I can train the model on both of them.


  • In keras mixed data and multiple inputs can be integrated using the keras function API.

    From an architectural point of view you will be introducing two input streams, into a dense layer and then you will be concatenating these input streams.

    # define two sets of inputs
    inputA = Input(shape=(32,))
    inputB = Input(shape=(128,))
    # the first branch operates on the first input
    x = Dense(8, activation="relu")(inputA)
    x = Dense(4, activation="relu")(x)
    x = Model(inputs=inputA, outputs=x)
    # the second branch opreates on the second input
    y = Dense(64, activation="relu")(inputB)
    y = Dense(32, activation="relu")(y)
    y = Dense(4, activation="relu")(y)
    y = Model(inputs=inputB, outputs=y)
    # combine the output of the two branches
    combined = concatenate([x.output, y.output])
    # apply a FC layer and then a regression prediction on the
    # combined outputs
    z = Dense(2, activation="relu")(combined)
    z = Dense(1, activation="linear")(z)
    # our model will accept the inputs of the two branches and
    # then output a single value
    model = Model(inputs=[x.input, y.input], outputs=z)

    A full tutorial can be found here.

    Note: If your landmarks are within the image dimensions, you could also generate an additional channel to the image where pixels with a landmark get the associated depth / z value. Your input is then an image with two channels.