How to use feed_dict in Tensorflow multiple GPU case

Recently, I try to learn how to use Tensorflow on multiple GPU to accelerate training speed. I found an official tutorial about training classification model based on Cifar10 dataset. However, I found that this tutorial reads image by using the queue. Out of curiosity, how can I use multiple GPU by feeding value into Session? It seems that it is hard for me to solve the problem that feeds different value from the same dataset to different GPU. Thank you, everybody! The following code is about part of the official tutorial.

images, labels = cifar10.distorted_inputs()
batch_queue = tf.contrib.slim.prefetch_queue.prefetch_queue(
      [images, labels], capacity=2 * FLAGS.num_gpus)
# Calculate the gradients for each model tower.
tower_grads = []
with tf.variable_scope(tf.get_variable_scope()):
  for i in xrange(FLAGS.num_gpus):
    with tf.device('/gpu:%d' % i):
      with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
        # Dequeues one batch for the GPU
        image_batch, label_batch = batch_queue.dequeue()
        # Calculate the loss for one tower of the CIFAR model. This function
        # constructs the entire CIFAR model but shares the variables across
        # all towers.
        loss = tower_loss(scope, image_batch, label_batch)

        # Reuse variables for the next tower.
        tf.get_variable_scope().reuse_variables()

        # Retain the summaries from the final tower.
        summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)

        # Calculate the gradients for the batch of data on this CIFAR tower.
        grads = opt.compute_gradients(loss)

        # Keep track of the gradients across all towers.
        tower_grads.append(grads)

Solution

QueueRunner and Queue-based API is relatively out-dated, it is clearly mentioned in Tensorflow docs:

Input pipelines using the queue-based APIs can be cleanly replaced by the tf.data API

As a result, it is recommended to use tf.data API. It optimized for multi GPU and TPU purposes.

How to use it?

dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train))
iterator = dataset.make_one_shot_iterator()
x,y = iterator.get_next()
# define your model
logit = tf.layers.dense(x,2) # use x directrly in your model
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
train_step = tf.train.AdamOptimizer().minimize(cost)
with tf.Session() as sess:
  sess.run(train_step)

You can create multiple iterator for each GPU with Dataset.shard() or more easily use estimator API.

For a complete tutorial see here.