Writing ZipDataSet to TFRecord

I'm trying to write a zipped dataset to TFRecord files following this tutorial, but my situation is different in that each element of each dataset in the ZipDataSet is a tensor rather than a scalar.

The tutorial addresses this contingency with the note

Note: To stay simple, this example only uses scalar inputs. The simplest way to handle non-scalar features is to use tf.serialize_tensor to convert tensors to binary-strings. Strings are scalars in tensorflow. Use tf.parse_tensor to convert the binary-string back to a tensor.

But I'm getting errors that seem to indicate that the _bytes_feature function is getting tensors rather than bytes.

import tensorflow as tf
import numpy as np

sess = tf.Session()
def _bytes_feature(value):
    """Returns a bytes_list from a string / byte."""
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def serialize_with_labels(a, b, c, d):
    Creates a tf.Example message ready to be written to a file.

    # Create a dictionary mapping the feature name to the tf.Example-compatible
    # data type.

    feature = {'a': _bytes_feature(a),
               'b': _bytes_feature(b),
               'c': _bytes_feature(c),
               'd': _bytes_feature(d),

    # Create a Features message using tf.train.Example.

    example_proto = tf.train.Example(features=tf.train
    return example_proto.SerializeToString()

def tf_serialize_w_labels(a, b, c, d):
    """Map serialize_with_labels to"""
    tf_string = tf.py_func(serialize_with_labels,
                           (a, b, c, d),
    return tf.reshape(tf_string, ())

# a is a [n,m,p] tensor
# b is a [n,m,p] tensor
# c is a [n,m,p] tensor
# d is a [n,1,1] tensor

zipped =,b,c,d))
# I have confirmed that each item of serial_tensors is a tuple
# of four bytestrings.
serial_tensors =

# Each item of serialized_features_dataset is a single bytestring
serialized_features_dataset =
writer ='test_output')
writeop = writer.write(serialized_features_dataset)

Is the basic format of the code I'm trying to run. It writes, but when I read in the TFRecord,

def _parse_function(example_proto):
    # Parse the input tf.Example proto using the dictionary below.

    feature_description = {
    'a': tf.FixedLenFeature([], tf.string, default_value=''),
    'b': tf.FixedLenFeature([], tf.string, default_value=''),
    'c': tf.FixedLenFeature([], tf.string, default_value=''),
    'd': tf.FixedLenFeature([], tf.string, default_value='')
    return tf.parse_single_example(example_proto, feature_description)

filenames = ['zipped_TFR']
raw_dataset =
parsed =
parsed_it = parsed.make_one_shot_iterator()

# prints the first element of a
print(['a'], out_type=tf.int32)))
#prints the first element of b
print(['b'], out_type=tf.int32)))
#prints the first element of c
print(['c'], out_type=tf.int32)))
#prints nothing
print(['d'], out_type=tf.int32)))

This isn't a matter of the iterator running out, as, for example, I've tried printing d before printing a, b, or c, gotten nothing, and then successfully printed a in the same session.

I'm using tensorflow-gpu version 1.10, and I'm stuck with it for the moment, which is why I'm using

writer ='test_output')

In stead of

writer ='test_output')

EDIT: Here is what worked.

First I flattened a, b, c and d down to shape [n,-1]. Then I changed serialize_w_labels to the code below (leaving tf_serialize_w_examples alone).

def serialize_w_labels(a, b, c, d, n, m, p):
    # The object we return
    ex = tf.train.SequenceExample()
    # A non-sequential feature of our example
    # Feature lists for the two sequential features of our example
    fl_a = ex.feature_lists.feature_list["a"]
    fl_b = ex.feature_lists.feature_list["b"]
    fl_c = ex.feature_lists.feature_list["c"]
    for _a, _b, _c in zip(a, b, c):
    return ex.SerializeToString()

The following correctly parses elements of the resulting dataset:

context_features = {
    "d": tf.FixedLenFeature([], dtype=tf.int64),
    "m": tf.FixedLenFeature([], dtype=tf.int64),
    "n": tf.FixedLenFeature([], dtype=tf.int64),
    "p": tf.FixedLenFeature([], dtype=tf.int64)
sequence_features = {
    "a": tf.FixedLenSequenceFeature([], dtype=tf.int64),
    "b": tf.FixedLenSequenceFeature([], dtype=tf.int64),
    "c": tf.FixedLenSequenceFeature([], dtype=tf.float32)

context_parsed, sequence_parsed = tf.parse_single_sequence_example(

Your dtypes may vary, obviously. The context features can then be used to reshape the flattened a, b, and c.


  • I think you should look into, which should allow you to write a sequence of features as a feature to a TFRecord file. It was used for example in YouTube8M dataset to store a feature which for each video was a set of frames and for each of the frames you had Tensor.


    Example how to read it: