Notes:
this question extends upon a previous question of mine. In that question I ask about the best way to store some dummy data as Example
and SequenceExample
seeking to know which is better for data similar to dummy data provided. I provide both explicit formulations of the Example
and SequenceExample
construction as well as, in the answers, a programatic way to do so.
Because this is still a lot of code, I am providing a Colab (interactive jupyter notebook hosted by google) file where you can try the code out yourself to assist. All the necessary code is there and it is generously commented.
I am trying to learn how to convert my data into TF Records as the claimed benefits are worthwhile for my data. However, the documentation leaves a lot to be desired and the tutorials / blogs (that I have seen) which try to go deeper, really only touch the surface or rehash the sparse docs that exist.
For the demo data considered in my previous question - as well as here - I have written a decent class that takes:
and can encode the data in 1 of 6 forms:
int64
in this case) with meta data tacked onnumpy.ndarray.tostring()
) with meta data tacked onExample, with sequence / classes dumped as byte string with meta data tacked on
SequenceExample, with sequence channels / classes separate in a numeric type and meta data as context
This works fine.
In the Colab I show how to write dummy data all in the same file as well as in separate files.
My question is how can I recover this data?
I given 4 attempts at trying to do so in the linked file.
Why is TFReader under a different sub-package from TFWriter?
Solved by updating the features to include shape information and remembering that SequenceExample
are unnamed FeatureLists
.
context_features = {
'Name' : tf.FixedLenFeature([], dtype=tf.string),
'Val_1': tf.FixedLenFeature([], dtype=tf.float32),
'Val_2': tf.FixedLenFeature([], dtype=tf.float32)
}
sequence_features = {
'sequence': tf.FixedLenSequenceFeature((3,), dtype=tf.int64),
'pclasses' : tf.FixedLenSequenceFeature((3,), dtype=tf.float32),
}
def parse(record):
parsed = tf.parse_single_sequence_example(
record,
context_features=context_features,
sequence_features=sequence_features
)
return parsed
filenames = [os.path.join(os.getcwd(),f"dummy_sequences_{i}.tfrecords") for i in range(3)]
dataset = tf.data.TFRecordDataset(filenames).map(lambda r: parse(r))
iterator = tf.data.Iterator.from_structure(dataset.output_types,
dataset.output_shapes)
next_element = iterator.get_next()
training_init_op = iterator.make_initializer(dataset)
for _ in range(2):
# Initialize an iterator over the training dataset.
sess.run(training_init_op)
for _ in range(3):
ne = sess.run(next_element)
print(ne)