Following is the partial code. I am trying to understand what "add" does. Why is the output of Add layer (None, 38, 300) when adding two inputs with different shapes here?
Following is the code in Keras.
image_model = Input(shape=(2048,))
x = Dense(units=EMBEDDING_DIM, activation="relu")(image_model)
x = BatchNormalization()(x)
language_model = Input(shape=(MAX_CAPTION_SIZE,))
y = Embedding(input_dim=VOCABULARY_SIZE, output_dim=EMBEDDING_DIM)(language_model)
y = Dropout(0.5)(y)
merged = add([x, y])
merged = LSTM(256, return_sequences=False)(merged)
merged = Dense(units=VOCABULARY_SIZE)(merged)
merged = Activation("softmax")(merged)
Why is the output of Add layer (None, 38, 300) when adding two inputs with different shapes here?
It's a technique called broadcasting. You can find more details here: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
In the example below, the first input(16,) is broadcast along the second dimension(2,) of the second input(2,16), so that the element-wise addition can happen.
import keras
import numpy as np
input1 = keras.layers.Input(shape=(16,))
input2 = keras.layers.Input(shape=(2,16))
added = keras.layers.Add()([input1, input2])
model = keras.models.Model(inputs=[input1, input2], outputs=added)
output = model.predict([np.ones((1,16)), np.ones((1,2,16))])
print(output.shape)
print(output)
(1, 2, 16)
[[[2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.] [2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]]]