I'm trying to train an RNN generative model using TPU in Google Colab. Full code of the notebook you can find here. In brief, I take text files, chop them on sequences and targets, then made a tf.data.Dataset from the lists. Then prepare vocabulary, and create keras.TextVectorization object with pre-set vocabulary. Then prepare a one-hot dataset that should return sequences (features) as (60, 107) tensor and targets as (107,) tensor. Then I create a simple model with one LTSM layer inside
with strategy.scope():
and try to train the model in a cycle. Something like that:
import tensorflow as tf
tpu = tf.distribute.cluster_resolver.TPUClusterResolver.connect()
strategy = tf.distribute.TPUStrategy(tpu)
#some code to make dataset (skipped)... and then
one_hot_dataset = dataset.map(lambda x, y: (tf.one_hot(text_vectorizer(x),
depth=vocab_size,
dtype='float32'),
tf.squeeze(tf.one_hot(text_vectorizer(y),
depth=vocab_size,
dtype='float32',
axis=1))))
batch_size = 1024 # I hope this can load a TPU sufficiently
one_hot_dataset = one_hot_dataset.batch(batch_size=batch_size,
num_parallel_calls=4)
one_hot_dataset = one_hot_dataset.prefetch(buffer_size=tf.data.AUTOTUNE)
# making a model
import keras
from keras import layers, Model
with strategy.scope():
inputs = keras.Input(shape=(maxlen, vocab_size), dtype='float32')
lstm_output = layers.LSTM(128)(inputs)
output = layers.Dense(vocab_size, activation='softmax')(lstm_output)
model = Model(inputs, output)
model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.01),
loss='categorical_crossentropy')
# and finally
model.fit(one_hot_dataset, epochs=1)
Then I get an error:
ValueError: in user code:
File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 1284, in train_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 1268, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
ValueError: input tensor Tensor("cond/Identity_8:0", dtype=float32) to TPUStrategy.run() has unknown rank, which is not allowed
that I even can't google.
The model trains normally on GPU (but too slowly). I have used the boiler code to fly with TPU earlier in a similar situation successfully. I am a bit suspicious about my dataset, but unfortunately can't understand the issue.
Can you make share dataset element shapes are set (except batch dimension) and printing one_hot_dataset.element_spec
is correct.
E.g.
def map_fn(x, y):
first = tf.one_hot(text_vectorizer(x), depth=vocab_size, dtype='float32')
first.set_shape((60, 107))
second = tf.squeeze(tf.one_hot(text_vectorizer(y), depth=vocab_size,
dtype='float32', axis=1))
second.set_shape(my_shape)
return (first, second)
one_hot_dataset = dataset.map(map_fn)