Search code examples
keras-2

what happens when you merge branches in keras with different shapes?


Following is the partial code. I am trying to understand what "add" does. Why is the output of Add layer (None, 38, 300) when adding two inputs with different shapes here?

Following is the code in Keras.

image_model = Input(shape=(2048,))
x = Dense(units=EMBEDDING_DIM, activation="relu")(image_model)
x = BatchNormalization()(x)

language_model = Input(shape=(MAX_CAPTION_SIZE,))
y = Embedding(input_dim=VOCABULARY_SIZE, output_dim=EMBEDDING_DIM)(language_model)
y = Dropout(0.5)(y)

merged = add([x, y])
merged = LSTM(256, return_sequences=False)(merged)
merged = Dense(units=VOCABULARY_SIZE)(merged)
merged = Activation("softmax")(merged)

enter image description here


Solution

  • Why is the output of Add layer (None, 38, 300) when adding two inputs with different shapes here?

    It's a technique called broadcasting. You can find more details here: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

    In the example below, the first input(16,) is broadcast along the second dimension(2,) of the second input(2,16), so that the element-wise addition can happen.

    import keras
    import numpy as np
    
    input1 = keras.layers.Input(shape=(16,))
    input2 = keras.layers.Input(shape=(2,16))
    added = keras.layers.Add()([input1, input2])
    model = keras.models.Model(inputs=[input1, input2], outputs=added)
    output = model.predict([np.ones((1,16)), np.ones((1,2,16))])
    print(output.shape)
    print(output)
    

    (1, 2, 16)

    [[[2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.] [2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]]]