Search code examples
tensorflowkerasgoogle-colaboratorytensortpu

Getting ValueError: input tensor Tensor to TPUStrategy.run() has unknown rank, which is not allowed in Google Collab


I'm trying to train an RNN generative model using TPU in Google Colab. Full code of the notebook you can find here. In brief, I take text files, chop them on sequences and targets, then made a tf.data.Dataset from the lists. Then prepare vocabulary, and create keras.TextVectorization object with pre-set vocabulary. Then prepare a one-hot dataset that should return sequences (features) as (60, 107) tensor and targets as (107,) tensor. Then I create a simple model with one LTSM layer inside

with strategy.scope():

and try to train the model in a cycle. Something like that:

import tensorflow as tf
tpu = tf.distribute.cluster_resolver.TPUClusterResolver.connect()
strategy = tf.distribute.TPUStrategy(tpu)

#some code to make dataset (skipped)... and then

one_hot_dataset = dataset.map(lambda x, y: (tf.one_hot(text_vectorizer(x), 
                                                       depth=vocab_size, 
                                                       dtype='float32'), 
                                            tf.squeeze(tf.one_hot(text_vectorizer(y), 
                                                                  depth=vocab_size, 
                                                                  dtype='float32', 
                                                                  axis=1))))

batch_size = 1024 # I hope this can load a TPU sufficiently

one_hot_dataset = one_hot_dataset.batch(batch_size=batch_size,
                                        num_parallel_calls=4)

one_hot_dataset = one_hot_dataset.prefetch(buffer_size=tf.data.AUTOTUNE)

# making a model

import keras
from keras import layers, Model

with strategy.scope():
    inputs = keras.Input(shape=(maxlen, vocab_size), dtype='float32')
    lstm_output = layers.LSTM(128)(inputs)
    output = layers.Dense(vocab_size, activation='softmax')(lstm_output)

    model = Model(inputs, output)

    model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.01),
                  loss='categorical_crossentropy')

# and finally

model.fit(one_hot_dataset, epochs=1)

Then I get an error:

ValueError: in user code:

    File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 1284, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 1268, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))

    ValueError: input tensor Tensor("cond/Identity_8:0", dtype=float32) to TPUStrategy.run() has unknown rank, which is not allowed

that I even can't google.

The model trains normally on GPU (but too slowly). I have used the boiler code to fly with TPU earlier in a similar situation successfully. I am a bit suspicious about my dataset, but unfortunately can't understand the issue.


Solution

  • Can you make share dataset element shapes are set (except batch dimension) and printing one_hot_dataset.element_spec is correct.

    E.g.

    def map_fn(x, y):
      first = tf.one_hot(text_vectorizer(x), depth=vocab_size, dtype='float32')
      first.set_shape((60, 107))
      second = tf.squeeze(tf.one_hot(text_vectorizer(y), depth=vocab_size,
                                     dtype='float32', axis=1))
      second.set_shape(my_shape)
      return (first, second)
    
    one_hot_dataset = dataset.map(map_fn)