Keras - Variational Autoencoder Incompatible shape

I am trying to adapt the code to achieve 1-D convolution using 1-D input. The model is compilable so you can see the layers and shapes in .summary(), but it throws the error when .fit() the model. it seems to occur in loss computation. Below is my code:

import numpy as np
from scipy.stats import norm

from keras.layers import Input, Dense, Lambda, Flatten, Reshape
from keras.layers import Conv1D, UpSampling1D
from keras.models import Model
from keras import backend as K
from keras import metrics

num_conv = 6
batch_size = 100
latent_dim = 2
intermediate_dim = 128
epochs = 50
epsilon_std = 1.0

x = Input(batch_shape=(batch_size, 310, 1)) 
conv_1 = Conv1D(1, kernel_size=num_conv,
                padding='same', activation='relu')(x)
conv_2 = Conv1D(64, kernel_size=num_conv,
                padding='same', strides=2, activation='relu')(conv_1)
conv_3 = Conv1D(64, kernel_size=num_conv,
                padding='same', activation='relu')(conv_2)

flatten = Flatten()(conv_3)
hidden = Dense(intermediate_dim, activation='relu')(flatten)

z_mean = Dense(latent_dim)(hidden)
z_log_var = Dense(latent_dim)(hidden)

def sampling(args):
    z_mean, z_log_var = args
    epsilon = K.random_normal(shape=(batch_size, latent_dim), 
                              mean=0., stddev=epsilon_std)
    return(z_mean + K.exp(z_log_var/2) * epsilon)

z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_var])

decoder_h = Dense(256, activation='relu')(z)
decoder = Dense(155, activation='relu')(decoder_h)
decoder = Reshape((155, 1))(decoder)
de_conv_1 = Conv1D(64, kernel_size=num_conv, 
                   padding='same', activation='relu')(decoder)
de_conv_2 = Conv1D(64, kernel_size=num_conv,
                   padding='same', activation='relu')(de_conv_1)
upsamp = UpSampling1D(2)(de_conv_2)
x_decoded_mean = Conv1D(1, kernel_size=num_conv,
                        padding='same', activation='relu')(upsamp)
x_decoded_mean = Reshape([310, 1])(x_decoded_mean)

def vae_loss(x, x_decoded_mean):
    x_ = x[:, 150:160, :]
    x_decoded_mean_ = x_decoded_mean[:, 150:160, :]
    xent_loss = 10 * metrics.mean_squared_error(x_, x_decoded_mean_)
    kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) -K.exp(z_log_var), axis=-1)
    return(xent_loss + kl_loss)

vae = Model(x, x_decoded_mean)
vae.compile(optimizer='rmsprop', loss=vae_loss)

The input data shape is (n_sample, 310, 1). It is an one-D time-series but I include prior and posterior 150 frames to predict the middle 10 frames, resulting in 310 frames as input.

In vae_loss(), the reason that x and x_decoded_mean are sliced is that the purpose is to reconstruct the middle 10 frames with additional information of prior and posterior 150 frames. Therefore I want to force the model to focus on loss computed only from the middle 10 frames.

I got the following error when I .fit() the model:

# X.shape == (n_samples, 310, 1)
# n_samples % batch_size == 0, y=X, batch_size=batch_size,

The long error below:

Epoch 1/50
InvalidArgumentError (see above for traceback): Incompatible shapes: [100,10] vs. [100]
         [[Node: gradients_4/add_121_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@add_121"], _device="/job:localhost/replica:0/task:0/cpu:0"](gradients_4/add_121_grad/Shape, gradients_4/add_121_grad/Shape_1)]]

Based on the line Incompatible shapes: [100,10] vs. [100], I believe it happens in loss computation but I can't figure out the solution. Moreover, even I don't do the slicing in vae_loss(), the error still show as Incompatible shapes: [100,310] vs. [100]. Could anyone please give me some suggestion?


  • The problem is that xent_loss is a 2D-tensor having a shape (100, 10), and kl_loss is a 1D-tensor having a shape (100). In tensorflow, it is invalid to add these two tensors. See this section from the official doc.

    Consider the previous example, instead of adding a scalar to a (2,3) matrix, add a vector of dimension (3) to a matrix of dimensions (2,3). Without specifying broadcasting, this operation is invalid. To correctly request matrix-vector addition, specify the broadcasting dimension to be (1), meaning the vector's dimension is matched to dimension 1 of the matrix.

    This occurs because metrics.mean_squared_error() takes an average over the feature axis, but not the time axis.

    To fix this problem, either take another K.mean() over the time axis:

    xent_loss = 10 * K.mean(metrics.mean_squared_error(x_, x_decoded_mean_), axis=-1)

    or, use K.squeeze() to remove the features axes before feeding the tensors into metrics.mean_squared_error() (but this only applies to 1D time-series):

    x_ = K.squeeze(x[:, 150:160, :], axis=-1)
    x_decoded_mean_ = K.squeeze(x_decoded_mean[:, 150:160, :], axis=-1)
    xent_loss = 10 * metrics.mean_squared_error(x_, x_decoded_mean_)

    However, the best way is to forget about metrics.mean_squared_error(), and compute the MSE by yourself, with a correct axis argument.

    xent_loss = 10 * K.mean(K.square(x_ - x_decoded_mean_), axis=[1, 2])