Search code examples
python-3.xtensorflowcomputer-visionlarge-data

Loading/ feed_dicting large dataset into Tensorflow session


I am trying to consume 50k image dataset for a convNet in ratio tr-60%, test-20%, validate-20%. So far I have created a placeholder and feed_dicting it @ sess.run(), as follows:-

tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
......
...
feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)

but according to official TF performance guide it is a poor way to implement, as follows:- link to TF guide

Unless for a special circumstance or for example code, do not feed data into the session from Python variables, e.g. dictionary.

# This will result in poor performance.
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

Can you please help in implementing queues for reading data in TF?

One way I found is :-

Create an op which is loading your data in stream fashion

But I am not sure 1) if it is best way, 2) I couldn't implement above suggestion, can you help in creating this op pseudo code? Thanks alot.


Solution

  • It is generally a bad idea to feed data using feed_dict but you don't always have to write ops to process your data. You can convert your image data to the format that tensorflow can recognize: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/how_tos/reading_data/convert_to_records.py. This process can be made in parallel and you can output a list of files since tensorflow can consume file list as well.

    Then follow the tutorial on this page to create queues and feed data in python: https://www.tensorflow.org/programmers_guide/reading_data