python python-2.7 neural-network tensorflow perceptron

Problems with multi-layer perceptron in Tensorflow

I created a perceptron (i.e. neural network with fully connected layer(s)) in Tensorflow with one hidden layer (with RELU activation function) and ran it on MNIST data successfully, getting a 90%+ accuracy rate. But when I add a second hidden layer, I get a very low accuracy rate (10%) even after many mini-batches of stochastic gradient descent. Any ideas for why this would happen? I can add my Python code to this post if it would be helpful.

Here is my graph code (uses Udacity course's starter code, but with additional layers added). Note that some aspects are commented out for simplicity - but even with this simpler version, the symptom remains the same (low accuracy rate of approx 10% even after many iterations):

import tensorflow as tf

batch_size = 128
hidden_size = 256
train_subset = 10000

graph = tf.Graph()
with graph.as_default():

  # Input data. For the training data, we use a placeholder that will be fed
  # at run time with a training minibatch.
  tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  #tf_train_dataset = tf.constant(train_dataset[:train_subset, :])
  #tf_train_labels = tf.constant(train_labels[:train_subset])  

  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)

  # Variables.
  weightsToHidden1 = tf.Variable(
    tf.truncated_normal([image_size * image_size, hidden_size]))
  biasesToHidden1 = tf.Variable(tf.zeros([hidden_size]))

  weightsToHidden2 = tf.Variable(
    tf.truncated_normal([hidden_size, hidden_size]))
  biasesToHidden2 = tf.Variable(tf.zeros([hidden_size]))

  weightsToOutput = tf.Variable(
    tf.truncated_normal([hidden_size, num_labels]))
  biasesToOutput = tf.Variable(tf.zeros([num_labels]))

  # Training computation.    
  logitsToHidden1 = tf.nn.relu(tf.matmul(tf_train_dataset, weightsToHidden1) 
                          + biasesToHidden1)

  validLogitsToHidden1 = tf.nn.relu(tf.matmul(tf_valid_dataset, weightsToHidden1) 
                          + biasesToHidden1)

  testLogitsToHidden1 = tf.nn.relu(tf.matmul(tf_test_dataset, weightsToHidden1) 
                          + biasesToHidden1)

  logitsToHidden2 = tf.nn.relu(tf.matmul(logitsToHidden1, weightsToHidden2) 
                          + biasesToHidden2)

  validLogitsToHidden2 = tf.nn.relu(tf.matmul(validLogitsToHidden1, weightsToHidden2) 
                          + biasesToHidden2)

  testLogitsToHidden2 = tf.nn.relu(tf.matmul(testLogitsToHidden1, weightsToHidden2) 
                          + biasesToHidden2)


  logitsToOutput = tf.matmul(logitsToHidden2, weightsToOutput) + biasesToOutput
  validLogitsToOutput = tf.matmul(validLogitsToHidden2, weightsToOutput) + biasesToOutput
  testLogitsToOutput = tf.matmul(testLogitsToHidden2, weightsToOutput) + biasesToOutput


  loss = (tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logitsToOutput, tf_train_labels))) #+
   # tf.nn.l2_loss(weightsToHidden1) * 0.002 + 
    #tf.nn.l2_loss(weightsToHidden2) * 0.002 + 
    #tf.nn.l2_loss(weightsToOutput) * 0.002)

  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logitsToOutput)
  valid_prediction = tf.nn.softmax(validLogitsToOutput)
  test_prediction = tf.nn.softmax(testLogitsToOutput)

Solution

Change your learning rate to 0.01 or even smaller value. It helps but in my case accuracy is still worse than with two layers perceptron