It is easy to use two threads that one keeps feeding data to the queue and the other consumes data from the queue and perform the computation. Since the TensorFlow recommends Dataset as input pipeline after 1.2.0., I would like to use the Dataset
and its iterator
to accomplish the task above, namely:
P.S. Why in the tutorial of Threading and Queues, TensorFlow uses thread
instead of process
?
Thank you in advance.
Distributed tf.contrib.data
pipelines are not yet supported as of TensorFlow 1.3. We are working on support for splitting datasets across devices and/or processes, but that support is not yet ready.
In the meantime, the easiest way to achieve your goal is to use a tf.FIFOQueue
. You can define a Dataset
that reads from a queue as follows:
q = tf.FIFOQueue(...)
# Define a dummy dataset that contains the same value repeated indefinitely.
dummy = tf.contrib.data.Dataset.from_tensors(0).repeat(None)
dataset_from_queue = dummy.map(lambda _: q.dequeue())
You can then compose other Dataset
tranformations with dataset_from_queue
.