Accuracy in history dictionary different from what printed on screen

When training a model in Keras, the accuracies printed on-screen at every epoch are different from what saved in the history object. For example (minimal test, compacted output):

history = model.fit(...)

Epoch 1/5
156/156 [===] - loss: 0.6325 - accuracy: 0.7700 - val_loss: 0.4330 - val_accuracy: 0.8156
Epoch 2/5
156/156 [===] - loss: 0.3855 - accuracy: 0.8538 - val_loss: 0.4692 - val_accuracy: 0.8050
Epoch 3/5
156/156 [===] - loss: 0.3918 - accuracy: 0.8427 - val_loss: 0.4666 - val_accuracy: 0.7861
Epoch 4/5
156/156 [===] - loss: 0.3820 - accuracy: 0.8461 - val_loss: 0.4101 - val_accuracy: 0.8014
Epoch 5/5
156/156 [===] - loss: 0.3927 - accuracy: 0.8492 - val_loss: 0.4092 - val_accuracy: 0.7979

Then (rounding like printed values for convenience):

>>> [round(x, 4) for x in history.history['accuracy']]
[0.8184, 0.8474, 0.8484, 0.8488, 0.8476]
>>> [round(x, 4) for x in history.history['val_accuracy']]
[0.8156, 0.805, 0.7861, 0.8014, 0.7979]

As you can see, while validation accuracies match printed values, training accuracies do not (tested both in Colab with GPU and local PC with CPU, using Keras 2.4.0 and TensorFlow 2.4.1).

This is a problem if you want to save data from multiple tests to a file, for example. What am I getting wrong?

EDIT: here is an example to reproduce the problem, slightly modified from TF MNIST quickstart. See the block right after calling model.fit(). https://colab.research.google.com/drive/14Uogeq8wRlZlinaKLbkFr_Bl2aLzUJuy?usp=sharing

EDIT 2: as suggested by another user, I submitted a bug issue here: https://github.com/tensorflow/tensorflow/issues/48408

Solution

I used your colab and able to reproduce your issue. Yes, this seems like a serious bug. I tested the code in both CPU and GPU mode with tf 2.0, 2.1, 2.3 without any issue. But this issue causes in tf 2.4 and tf-nightly.

I would suggest you raise a bug issue in TensorFlow GitHub. And share a cross-link here and there so that others can follow the update. In the meantime, you can roll back to tf 2.3. However, I didn't check whether callbacks.CSVLogger also has some issue in the latest release, you can check that too.