Search code examples
pythonpytorch

Loss function giving nan in pytorch


In pytorch, I have a loss function of 1/x plus a few other terms. The last layer of my neural net is a sigmoid, so the values will be between 0 and 1.

Some value fed to 1/x must get really small at some point because my loss has become this:

loss: 11.047459  [729600/235474375]
loss: 9.348356  [731200/235474375]
loss: 7.184393  [732800/235474375]
loss: 8.699876  [734400/235474375]
loss: 7.178806  [736000/235474375]
loss: 8.090066  [737600/235474375]
loss: 12.415799  [739200/235474375]
loss: 10.422441  [740800/235474375]
loss: 8.335846  [742400/235474375]
loss:     nan  [744000/235474375]
loss:     nan  [745600/235474375]
loss:     nan  [747200/235474375]
loss:     nan  [748800/235474375]
loss:     nan  [750400/235474375]

I'm wondering if there's any way to "rewind" if nan is hit or define the loss function so that it's never hit? Thanks!


Solution

  • Your loss is jumping all over the place instead of steadily decreasing. Have you tried decreasing your learning rate? It looks like it's jumping across the minimum, bouncing back and forth. This can happen if the learning rate is too high.

    To answer your question about rewinding, ideally you shouldn't have to rewind, the loss should be steadily decreasing. Also you may want to look into learning rate schedulers.

    why the learning rate needs tuned