Search code examples
pythonkerasdeep-learningkeras-layerautoencoder

How to use Keras merge layer for autoencoder with two ouput


Assume I have two input: X and Y and I want to design and joint autoencoder to reconstruct the X' and Y'.

like in the figure, the X is the audio input and Y is the video input. This deep architecture is cool since it has two inputs with two outputs. Moreover, they share the some layer in the middle,. My question is how to use Keras to write this autoencoder. Let assume each layer is fully connected except the share layer in the middle.

here is my code as follow:

 from keras.layers import Input, Dense
 from keras.models import Model
 import numpy as np

 X = np.random.random((1000, 100))
 y = np.random.random((1000, 300))  # x and y can be different size

 # the X autoencoder layer 

 Xinput = Input(shape=(100,))

 encoded = Dense(50, activation='relu')(Xinput)
 encoded = Dense(20, activation='relu')(encoded)
 encoded = Dense(15, activation='relu')(encoded)

 decoded = Dense(20, activation='relu')(encoded)
 decoded = Dense(50, activation='relu')(decoded)
 decoded = Dense(100, activation='relu')(decoded)



 # the Y autoencoder layer 
 Yinput = Input(shape=(300,))

 encoded = Dense(120, activation='relu')(Yinput)
 encoded = Dense(50, activation='relu')(encoded)
 encoded = Dense(15, activation='relu')(encoded)

 decoded = Dense(50, activation='relu')(encoded)
 decoded = Dense(120, activation='relu')(decoded)
 decoded = Dense(300, activation='relu')(decoded)

I simply the middle has 15 nodes for X and Y. My question is how to train this joint autoencoder with the loss function \|X-X'\|^2 + \|Y-Y'\|^2?

Thanks


Solution

  • The way your code is you have two seperate models. While you simply can use the output of your shared representation layer twice for the the two following subnets, you have to merge the two subnets for input:

    Xinput = Input(shape=(100,))
    Yinput = Input(shape=(300,))
    
    Xencoded = Dense(50, activation='relu')(Xinput)
    Xencoded = Dense(20, activation='relu')(Xencoded)
    
    
    Yencoded = Dense(120, activation='relu')(Yinput)
    Yencoded = Dense(50, activation='relu')(Yencoded)
    
    shared_input = Concatenate()([Xencoded, Yencoded])
    shared_output = Dense(15, activation='relu')(shared_input)
    
    Xdecoded = Dense(20, activation='relu')(shared_output)
    Xdecoded = Dense(50, activation='relu')(Xdecoded)
    Xdecoded = Dense(100, activation='relu')(Xdecoded)
    
    Ydecoded = Dense(50, activation='relu')(shared_output)
    Ydecoded = Dense(120, activation='relu')(Ydecoded)
    Ydecoded = Dense(300, activation='relu')(Ydecoded)
    

    Now you have two seperate outputs. So you need two seperate loss function which will be added anyway so for compiling the model:

    model = Model([Xinput, Yinput], [Xdecoded, Ydecoded])
    model.compile(optimizer='adam', loss=['mse', 'mse'], loss_weights=[1., 1.])
    

    You can then simply train the model by:

    model.fit([X_input, Y_input], [X_label, Y_label])