I'm training a deep network with two data input pipelines, one for training and one for validation. They use shuffle_batch_join
and batch_join
respectively for parallel data reading. The data stream that is used in the network is decided using a tf.cond
operation on top of these two pipelines, which is controlled by a is_training
placeholder that is set to true for a training iteration and false when doing validation. I have 4 threads for reading training data and 1 thread for validation.
However, I just added the queue summaries to tensorboard, and observed that validation queue's summary (showing fraction of the queue that is full) gets non-zero at one point during the training, and then drops back to 0. This seems very weird because validation runs only after 1K iterations, and those data points should only be removed at that point. Does anyone have a similar experience or can shed some light into what might be happening?
Answered on TensorFlow Discuss Forum (https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/mLrt5qc9_uU)