python tensorflow neural-network loss tf-slim

Tensorflow neural network loss value NaN

I'm trying to build a simple multilayer perceptron model on a large data set but I'm getting the loss value as nan. The weird thing is: after the first training step, the loss value is not nan and is about 46 (which is oddly low. when i run a logistic regression model, the first loss value is about ~3600). But then, right after that the loss value is constantly nan. I used tf.print to try and debug it as well.

The goal of the model is to predict ~4500 different classes - so it's a classification problem. When using tf.print, I see that after the first training step (or feed forward through MLP), the predictions coming out from the last fully connected layer seem right (all varying numbers between 1 and 4500). But then, after that the outputs from the last fully connected layer go to either all 0's or some other constant number (0 0 0 0 0).

For some information about my model:

3 layer model. all fully connected layers.
batch size of 1000
learning rate of .001 (i also tried .1 and .01 but nothing changed)
using CrossEntropyLoss (i did add an epsilon value to prevent log0)
using AdamOptimizer
learning rate decay is .95

The exact code for the model is below: (I'm using the TF-Slim library)

input_layer = slim.fully_connected(model_input, 5000, activation_fn=tf.nn.relu)
hidden_layer = slim.fully_connected(input_layer, 5000, activation_fn=tf.nn.relu)
output = slim.fully_connected(hidden_layer, vocab_size, activation_fn=tf.nn.relu)
output = tf.Print(output, [tf.argmax(output, 1)], 'out = ', summarize = 20, first_n = 10)
return {"predictions": output}

Any help would be greatly appreciated! Thank you so much!

Solution

Two (possibly more) reasons why it doesn't work:

You skipped or inappropriately applied feature scaling of your inputs and outputs. Consequently, data may be difficult to handle for Tensorflow.
Using ReLu, which is a discontinuous function, may raise issues. Try using other activation functions, such as tanh or sigmoid.