Search code examples
pythontensorflowkerastime-seriesrecurrent-neural-network

tf.get_shape() returning None incorrectly


I'm currently trying to create a WGAN implementation with gradient penalty in keras following the setup here: https://keras.io/examples/generative/wgan_gp/. However, I have modified this to generate time series using RNNs.

The time series in the training set are of variable lengths, so I am training the model using train_on_batch with one time series at a time. I have modified the train_step function in the code linked above to handle this, with the new function given by

    def train_step(self, X):
        if isinstance(X, tuple):
            X = X[0]
        batch_size = X.get_shape()[0]
        timesteps = X.get_shape()[1]

        for i in range(self.d_steps):
            # Get the latent vector
            noise = tf.random.normal((batch_size, self.latent_dims))
            noise = tf.reshape(noise, (batch_size, 1, self.latent_dims))
            noise = tf.repeat(noise, timesteps, 1)
            with tf.GradientTape() as tape:
                fake_images = self.generator(noise, training=True)
                fake_logits = self.discriminator(fake_images, training=True)
                real_logits = self.discriminator(X, training=True)

                d_cost = self.d_loss_fn(real_img=real_logits, fake_img=fake_logits)
                gp = self.gradient_penalty(batch_size, X, fake_images)
                d_loss = d_cost + gp * self.gp_weight

            d_gradient = tape.gradient(d_loss, self.discriminator.trainable_variables)
            self.d_optimizer.apply_gradients(
                zip(d_gradient, self.discriminator.trainable_variables)
            )

        noise = tf.random.normal((batch_size, self.latent_dims))
        noise = tf.reshape(noise, (batch_size, 1, self.latent_dims))
        noise = tf.repeat(noise, timesteps, 1)
        with tf.GradientTape() as tape:
            generated_data = self.generator(noise, training=True)
            gen_img_logits = self.discriminator(generated_data, training=True)
            g_loss = self.g_loss_fn(gen_img_logits)

        gen_gradient = tape.gradient(g_loss, self.generator.trainable_variables)
        self.g_optimizer.apply_gradients(
            zip(gen_gradient, self.generator.trainable_variables)
        )
        return {"d_loss": d_loss, "g_loss": g_loss}

and run this using

for epoch in epochs:
    names = train_df.names.unique()
    for batch in nbatches:
        name = names[batch]
        X = train_df[train_df.name == name].values
        X = X[:, 1:] # removes name column
        X = X.reshape((1, *X.shape))
        wgan.train_on_batch(X)

Here, train_df is just a pandas dataframe filled with 12 columns containing values between 0 and 1 (these contain the observations in the time series) and a 13th column which just contains the name of each time series to separate out the data (this is the first column).

The idea of this is that for each time series, the first part of train_step will generate noise with the same number of timesteps as the time series which ensures that the generated data is the same shape as the real data.

The number of timesteps is supposed to be given by X.get_shape()[1]. For the first iteration, X is a numpy array of shape (1, 18, 12) and when passed to train_step the size of the tensor X is also (1, 18, 12) which means that the variable timesteps is set to 18 by timesteps = X.get_shape()[1] as expected. For this second iteration, X is a numpy array of shape (1, 15, 12) and this also works as expected. However, on the third iteration X is a numpy array of shape (1, 13, 12) and when passed to train_step the shape is now (1, None, 12) which means that timesteps is set to None and the code then doesn't work.

I'm very confused why X.get_shape() works correctly at the start but not after the third iteration and can't find a fix. Basically I just need to set timesteps to the correct value, I was also thinking of maybe passing in the value as a separate variable to ensure this value is correct rather than relying on get_shape but can't think of a way to do that. Can anyone suggest why get_shape starts returning None after 2 iterations and how to avoid it? If you've got this far, thanks very much for reading and apologies for the length!


Solution

  • Try changing:

    timesteps = X.get_shape()[1]
    

    To:

    timesteps = tf.shape(X)[1]
    

    to get the dynamic shape of X during training.