what happens when you merge branches in keras with different shapes?

Following is the partial code. I am trying to understand what "add" does. Why is the output of Add layer (None, 38, 300) when adding two inputs with different shapes here?

Following is the code in Keras.

image_model = Input(shape=(2048,))
x = Dense(units=EMBEDDING_DIM, activation="relu")(image_model)
x = BatchNormalization()(x)

language_model = Input(shape=(MAX_CAPTION_SIZE,))
y = Embedding(input_dim=VOCABULARY_SIZE, output_dim=EMBEDDING_DIM)(language_model)
y = Dropout(0.5)(y)

merged = add([x, y])
merged = LSTM(256, return_sequences=False)(merged)
merged = Dense(units=VOCABULARY_SIZE)(merged)
merged = Activation("softmax")(merged)

Solution

Why is the output of Add layer (None, 38, 300) when adding two inputs with different shapes here?

It's a technique called broadcasting. You can find more details here: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

In the example below, the first input(16,) is broadcast along the second dimension(2,) of the second input(2,16), so that the element-wise addition can happen.

import keras
import numpy as np

input1 = keras.layers.Input(shape=(16,))
input2 = keras.layers.Input(shape=(2,16))
added = keras.layers.Add()([input1, input2])
model = keras.models.Model(inputs=[input1, input2], outputs=added)
output = model.predict([np.ones((1,16)), np.ones((1,2,16))])
print(output.shape)
print(output)

(1, 2, 16)

[[[2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.] [2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]]]