Search code examples
pythontensorflowtensorflow-datasets

Reading from .tfrecord files using tf.data.Dataset


I want to read the dataset generated by this code with the tf.data.Dataset api. The repo shows it was written like this:

def image_to_tfexample(image_data, image_format, height, width, class_id):
  return tf.train.Example(features=tf.train.Features(feature={
      'image/encoded': bytes_feature(image_data),
      'image/format': bytes_feature(image_format),
      'image/class/label': int64_feature(class_id),
      'image/height': int64_feature(height),
      'image/width': int64_feature(width),
  }))

with (encoded byte-string, b'png', 32, 32, label) as parameters.

So, to read the .tfrecord file, the data format would have to be:

example_fmt = {
    'image/encoded': tf.FixedLenFeature((), tf.string, ""),
    'image/format': tf.FixedLenFeature((), tf.string, ""),
    'image/class/label': tf.FixedLenFeature((), tf.int64, -1),
    'image/height': tf.FixedLenFeature((), tf.int64, -1),
    'image/width': tf.FixedLenFeature((), tf.int64, -1)
}
parsed = tf.parse_single_example(example, example_fmt)
image = tf.decode_raw(parsed['image/encoded'], out_type=tf.uint8)

But it doesn't work. The dataset is empty after reading and generating an iterator with it raises OutOfRangeError: End of sequence.

A short python script for reproduction can be found here. I'm struggling to find exact documentation or examples for this problem.


Solution

  • I can't test your code because I don't have the train.tfrecords file. Does this code create an empty dataset?

    dataset = tf.data.TFRecordDataset('train.tfrecords')
    dataset = dataset.map(parse_fn)
    itr = dataset.make_one_shot_iterator()
    
    with tf.Session() as sess:
        while True:
            try:
                print(sess.run(itr.get_next()))
            except tf.errors.OutOfRangeError:
                break
    

    If this gives you an error, please let me know which line produces it.