Search code examples
tensorflowtensorflow-datasetstensorflow2.0tfrecord

Tensorflow - Edit a TFRecord


Question: Is there a way to append an existing TFRecord?

Note: The .TFRecord is created by my own script (not a .tfrecord I found on web), so I have full control of its contents.


Solution

  • It is not possible to append to an existing records file as such, or at least not through the functions that TensorFlow provides. Record files are written at C++ level by a PyRecordWriter, which calls the function NewWriteableFile when it is created, deleting any existing file with the given name to create a new one. However, it is possible to create a new records file with the contents of another one followed by new records.

    For TensorFlow 1.x, you could do it like this:

    import tensorflow as tf
    
    def append_records_v1(in_file, new_records, out_file):
        with tf.io.TFRecordWriter(out_file) as writer:
            with tf.Graph().as_default(), tf.Session():
                ds = tf.data.TFRecordDataset([in_file])
                rec = ds.make_one_shot_iterator().get_next()
                while True:
                    try:
                        writer.write(rec.eval())
                    except tf.errors.OutOfRangeError: break
            for new_rec in new_records:
                writer.write(new_rec)
    

    In TensorFlow 2.x (eager execution), you could do it like this:

    import tensorflow as tf
    
    def append_records_v2(in_file, new_records, out_file):
        with tf.io.TFRecordWriter(out_file) as writer:
            ds = tf.data.TFRecordDataset([in_file])
            for rec in ds:
                writer.write(rec.numpy())
            for new_rec in new_records:
                writer.write(new_rec)