Encoder input Different from Decoder Output

Hi Guys I am working with this code from machinecurve

The endecode part has this architecture the input are images with 28x28 size:

i       = Input(shape=input_shape, name='encoder_input')
cx      = Conv2D(filters=128, kernel_size=5, strides=2, padding='same', activation='relu')(i)
cx      = BatchNormalization()(cx)
cx      = Conv2D(filters=256, kernel_size=5, strides=2, padding='same', activation='relu')(cx)
cx      = BatchNormalization()(cx)
cx      = Conv2D(filters=512, kernel_size=5, strides=2, padding='same', activation='relu')(cx)
cx      = BatchNormalization()(cx)
cx      = Conv2D(filters=1024, kernel_size=5, strides=2, padding='same', activation='relu')(cx)
cx      = BatchNormalization()(cx)
x       = Flatten()(cx)
x       = Dense(20, activation='relu')(x)
x       = BatchNormalization()(x)
mu      = Dense(latent_dim, name='latent_mu')(x)
sigma   = Dense(latent_dim, name='latent_sigma')(x)

The decode parts are as follows and it tries to reverse the layers defined in the code part:

d_i   = Input(shape=(latent_dim, ), name='decoder_input')
x     = Dense(conv_shape[1] * conv_shape[2] * conv_shape[3], activation='relu')(d_i)
x     = BatchNormalization()(x)
x     = Reshape((conv_shape[1], conv_shape[2], conv_shape[3]))(x)
cx    = Conv2DTranspose(filters=1024, kernel_size=5, strides=2, padding='same', activation='relu')(x)
cx    = BatchNormalization()(cx)
cx    = Conv2DTranspose(filters=512, kernel_size=5, strides=2, padding='same', activation='relu')(cx)
cx    = BatchNormalization()(cx)
cx    = Conv2DTranspose(filters=256, kernel_size=5, strides=2, padding='same', activation='relu')(cx)
cx    = BatchNormalization()(cx)
cx    = Conv2DTranspose(filters=128, kernel_size=5, strides=2, padding='same', activation='relu')(cx)
cx    = BatchNormalization()(cx)
o     = Conv2DTranspose(filters=num_channels, kernel_size=3, activation='sigmoid', padding='same', name='decoder_output')(cx)

As we see below the encoder_input must be the same as the decoder_output:

Model: "vae"
Layer (type)                 Output Shape              Param #   
encoder_input (InputLayer)   (None, 28, 28, 1)         0         
encoder (Model)              [(None, 2), (None, 2), (N 17298104  
decoder (Model)              (None, 32, 32, 1)         43457025  
Total params: 60,755,129
Trainable params: 60,739,217
Non-trainable params: 15,912

And then when the model is trained we have this error:

InvalidArgumentError:  Incompatible shapes: [100352] vs. [131072]
     InvalidArgumentError:  Incompatible shapes: [100352] vs. [131072]
     [[node gradients/loss/decoder_loss/kl_reconstruction_loss/mul_1_grad/BroadcastGradientArgs (defined at C:\Users\XXXXX\.conda\envs\keypoints\lib\site-packages\tensorflow_core\python\framework\ ]] [Op:__inference_keras_scratch_graph_22124]

Function call stack:

Do you have any idea in how to solve this issue please?

I also define:

# Define sampling with reparameterization trick
def sample_z(args):
    mu, sigma = args
    batch     = K.shape(mu)[0]
    dim       = K.int_shape(mu)[1]
    eps       = K.random_normal(shape=(batch, dim))
    return mu + K.exp(sigma / 2) * eps

# Use reparameterization trick to ensure correct gradient
z       = Lambda(sample_z, output_shape=(latent_dim, ), name='z')([mu, sigma])

And the encoder will be defined:

encoder = Model(i, [mu, sigma, z], name='encoder')

The architecture is:

Model: "encoder"
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_input (InputLayer)      (None, 28, 28, 1)    0                                            
conv2d_10 (Conv2D)              (None, 14, 14, 128)  3328        encoder_input[0][0]              
batch_normalization_25 (BatchNo (None, 14, 14, 128)  512         conv2d_10[0][0]                  
conv2d_11 (Conv2D)              (None, 7, 7, 256)    819456      batch_normalization_25[0][0]     
batch_normalization_26 (BatchNo (None, 7, 7, 256)    1024        conv2d_11[0][0]                  
conv2d_12 (Conv2D)              (None, 4, 4, 512)    3277312     batch_normalization_26[0][0]     
batch_normalization_27 (BatchNo (None, 4, 4, 512)    2048        conv2d_12[0][0]                  
conv2d_13 (Conv2D)              (None, 2, 2, 1024)   13108224    batch_normalization_27[0][0]     
batch_normalization_28 (BatchNo (None, 2, 2, 1024)   4096        conv2d_13[0][0]                  
flatten_4 (Flatten)             (None, 4096)         0           batch_normalization_28[0][0]     
dense_7 (Dense)                 (None, 20)           81940       flatten_4[0][0]                  
batch_normalization_29 (BatchNo (None, 20)           80          dense_7[0][0]                    
latent_mu (Dense)               (None, 2)            42          batch_normalization_29[0][0]     
latent_sigma (Dense)            (None, 2)            42          batch_normalization_29[0][0]     
z (Lambda)                      (None, 2)            0           latent_mu[0][0]                  
Total params: 17,298,104

Similar the decoder part is defined:

decoder = Model(d_i, o, name='decoder')

The architecture of the decoder is:

Model: "decoder"
Layer (type)                 Output Shape              Param #   
decoder_input (InputLayer)   (None, 2)                 0         
dense_8 (Dense)              (None, 4096)              12288     
batch_normalization_30 (Batc (None, 4096)              16384     
reshape_4 (Reshape)          (None, 2, 2, 1024)        0         
conv2d_transpose_10 (Conv2DT (None, 4, 4, 1024)        26215424  
batch_normalization_31 (Batc (None, 4, 4, 1024)        4096      
conv2d_transpose_11 (Conv2DT (None, 8, 8, 512)         13107712  
batch_normalization_32 (Batc (None, 8, 8, 512)         2048      
conv2d_transpose_12 (Conv2DT (None, 16, 16, 256)       3277056   
batch_normalization_33 (Batc (None, 16, 16, 256)       1024      
conv2d_transpose_13 (Conv2DT (None, 32, 32, 128)       819328    
batch_normalization_34 (Batc (None, 32, 32, 128)       512       
decoder_output (Conv2DTransp (None, 32, 32, 1)         1153      
Total params: 43,457,025
Trainable params: 43,444,993
Non-trainable params: 12,032

And finally we put it all together:

# =================
# VAE as a whole
# =================
# Instantiate VAE
vae_outputs = decoder(encoder(i)[2])
vae         = Model(i, vae_outputs, name='vae')


  • This a problem due to the output shape of your decoder... you can simply solve it by changing the final layer of your decoder with:

    Conv2D(filters=num_channels, kernel_size=5, activation='sigmoid', name='decoder_output')

    here the full code:

    num_channels = 1
    latent_dim = 2
    input_shape = (28,28,1)
    i       = Input(shape=input_shape, name='encoder_input')
    cx      = Conv2D(filters=128, kernel_size=5, strides=2, padding='same', activation='relu')(i)
    cx      = BatchNormalization()(cx)
    cx      = Conv2D(filters=256, kernel_size=5, strides=2, padding='same', activation='relu')(cx)
    cx      = BatchNormalization()(cx)
    cx      = Conv2D(filters=512, kernel_size=5, strides=2, padding='same', activation='relu')(cx)
    cx      = BatchNormalization()(cx)
    cx      = Conv2D(filters=1024, kernel_size=5, strides=2, padding='same', activation='relu')(cx)
    cx      = BatchNormalization()(cx)
    x       = Flatten()(cx)
    x       = Dense(20, activation='relu')(x)
    x       = BatchNormalization()(x)
    mu      = Dense(latent_dim, name='latent_mu')(x)
    sigma   = Dense(latent_dim, name='latent_sigma')(x)
    conv_shape = K.int_shape(cx)
    d_i   = Input(shape=(latent_dim, ), name='decoder_input')
    x     = Dense([1:]), activation='relu')(d_i)
    x     = BatchNormalization()(x)
    x     = Reshape(conv_shape[1:])(x)
    cx    = Conv2DTranspose(filters=1024, kernel_size=5, strides=2, padding='same', activation='relu')(x)
    cx    = BatchNormalization()(cx)
    cx    = Conv2DTranspose(filters=512, kernel_size=5, strides=2, padding='same', activation='relu')(cx)
    cx    = BatchNormalization()(cx)
    cx    = Conv2DTranspose(filters=256, kernel_size=5, strides=2, padding='same', activation='relu')(cx)
    cx    = BatchNormalization()(cx)
    cx    = Conv2DTranspose(filters=128, kernel_size=5, strides=2, padding='same', activation='relu')(cx)
    cx    = BatchNormalization()(cx)
    o     = Conv2D(filters=num_channels, kernel_size=5, activation='sigmoid', name='decoder_output')(cx)

    sampling layer:

    def sample_z(args):
        mu, sigma = args
        batch     = K.shape(mu)[0]
        dim       = K.int_shape(mu)[1]
        eps       = K.random_normal(shape=(batch, dim))
        return mu + K.exp(sigma / 2) * eps
    # Use reparameterization trick to ensure correct gradient
    z       = Lambda(sample_z, output_shape=(latent_dim, ), name='z')([mu, sigma])

    final VAE:

    encoder = Model(i, [mu, sigma, z], name='encoder')
    decoder = Model(d_i, o, name='decoder')
    vae_outputs = decoder(encoder(i)[2])
    vae         = Model(i, vae_outputs, name='vae')


    Layer (type)                 Output Shape              Param #   
    encoder_input (InputLayer)   [(None, 28, 28, 1)]       0         
    encoder (Model)              [(None, 2), (None, 2), (N 17298104  
    decoder (Model)              (None, 28, 28, 1)         43459073  

    as you can see, input and output shapes now match