I've been following this tutorial on building a custom training loop with keras from scratch and trying to apply it to my problem.
I have a set of (71x71x3) shaped images with an additional meta for each image that is a float number.
To apply this to the tutorial I used the following code for data preparation:
import tensorflow as tf
batch_size = 64
# Prepare the training dataset.
t1 = tf.data.Dataset.from_tensor_slices((image_train, meta_train))
t2 = tf.data.Dataset.from_tensor_slices(label_train)
train_dataset = tf.data.Dataset.zip((t1,t2))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
# Prepare the validation dataset.
v1 = tf.data.Dataset.from_tensor_slices((image_test, meta_test))
v2 = tf.data.Dataset.from_tensor_slices(label_test)
val_dataset = tf.data.Dataset.zip((v1,v2))
val_dataset = val_dataset.batch(batch_size)
However, when I start training with the exact code provided in the tutorial:
epochs = 2
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
# Open a GradientTape to record the operations run
# during the forward pass, which enables auto-differentiation.
with tf.GradientTape() as tape:
# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
logits = model(x_batch_train, training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = loss_fn(y_batch_train, logits)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))
# Log every 200 batches.
if step % 200 == 0:
print(
"Training loss (for one batch) at step %d: %.4f"
% (step, float(loss_value))
)
print("Seen so far: %s samples" % ((step + 1) * batch_size))
the samples "seen so far" do not increase opposed to the tutorial:
Start of epoch 0
Training loss (for one batch) at step 0: 201.9029
Seen so far: 64 samples
Start of epoch 1
Training loss (for one batch) at step 0: 0.1668
Seen so far: 64 samples
Start of epoch 2
Training loss (for one batch) at step 0: 0.1449
Seen so far: 64 samples
Start of epoch 3
Training loss (for one batch) at step 0: 0.2491
Seen so far: 64 samples
Start of epoch 0
Training loss (for one batch) at step 0: 155.5935
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.3908
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.1934
Seen so far: 25664 samples
The rest of the training seems fine and the model seems to work. But I'm not sure whether it is doing what it's supposed to do.
I've changed the batch_size
but it did not solve the problem.
Are you using the same dataset? As I see it, there is no problem: your dataset just seems smaller (or using larger batch_size).
What happens is that the counter of seen examples so far is for each epoch: it restarts on each epoch, so, as can see in your output, each "block" of output is for a different epoch, and hence, the counter restarted. However, in the keras example the counter is increased on each because all are under the same epoch.