Search code examples
pythontensorflowmachine-learningneural-networkmnist

Tensorflow data processing using UCI Dataset


I am trying to use Tensorflow to recognize handwritten digits of UCI dataset(https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits). Where each line is a flattened 8*8 matrix of the image pixels and the last attribute is the class code 0-9. However the tutorials I followed was on MNIST data which is quite different. It has a 28*28 matrix with 0-255 value. So, it was something like this:

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data", one_hot=True)
x = tf.placeholder('float', [None, 784])
y = tf.placeholder('float')

Since I am quite new to Tensorflow, I could not prepare the neural network model for the UCI data. I just want few direction on how to proceed. I have 2 major questions.

  1. Is this the correct way to import the data?
  2. How to get the 'y' label as the last attribute?

Currently I am thinking of doing something like this:

filename_queue = tf.train.string_input_producer(["optdigits.tra"])
reader = tf.TextLineReader()
_, serialized_example = reader.read(filename_queue)
image,label = decode(serialized_example)
x = tf.placeholder('float', [None, 64])
y = tf.placeholder('float')

Basically, I want to prepare an input layer with 64 nodes and a 'y' label with the output to train the NN model.


Solution

  • I am also new, and probably this is not a good way to do it. I used numpy to import the data, then converted it to tensorflow format.

    import tensorflow as tf
    import numpy as np
    
    trainingDataSet_ = np.loadtxt('/data/optdigits.tra', delimiter=',');
    trainingDataSet = tf.convert_to_tensor(trainingDataSet_, np.int32)
    
    # store labels of each sample
    y = trainingDataSet[:, 64]
    
    # remove lables from features
    x = trainingDataSet[:, :64]