Question: Is there a way to append an existing TFRecord?
Note: The .TFRecord is created by my own script (not a .tfrecord I found on web), so I have full control of its contents.
It is not possible to append to an existing records file as such, or at least not through the functions that TensorFlow provides. Record files are written at C++ level by a PyRecordWriter
, which calls the function NewWriteableFile
when it is created, deleting any existing file with the given name to create a new one. However, it is possible to create a new records file with the contents of another one followed by new records.
For TensorFlow 1.x, you could do it like this:
import tensorflow as tf
def append_records_v1(in_file, new_records, out_file):
with tf.io.TFRecordWriter(out_file) as writer:
with tf.Graph().as_default(), tf.Session():
ds = tf.data.TFRecordDataset([in_file])
rec = ds.make_one_shot_iterator().get_next()
while True:
try:
writer.write(rec.eval())
except tf.errors.OutOfRangeError: break
for new_rec in new_records:
writer.write(new_rec)
In TensorFlow 2.x (eager execution), you could do it like this:
import tensorflow as tf
def append_records_v2(in_file, new_records, out_file):
with tf.io.TFRecordWriter(out_file) as writer:
ds = tf.data.TFRecordDataset([in_file])
for rec in ds:
writer.write(rec.numpy())
for new_rec in new_records:
writer.write(new_rec)