I'm trying to implement a simple NN in Torch to learn more about it. I created a very simple dataset: binary numbers from 0 to 15 and my goal is to classify the numbers into two classes - class 1 are numbers 0-3 and 12-15, class 2 are the remaining ones. The following code is what i have now (i have removed the data loading routine only):
require 'torch'
require 'nn'
data = torch.Tensor( 16, 4 )
class = torch.Tensor( 16, 1 )
network = nn.Sequential()
network:add( nn.Linear( 4, 8 ) )
network:add( nn.ReLU() )
network:add( nn.Linear( 8, 2 ) )
network:add( nn.LogSoftMax() )
criterion = nn.ClassNLLCriterion()
for i = 1, 300 do
prediction = network:forward( data )
--print( "prediction: " .. tostring( prediction ) )
--print( "class: " .. tostring( class ) )
loss = criterion:forward( prediction, class )
network:zeroGradParameters()
grad = criterion:backward( prediction, class )
network:backward( data, grad )
network:updateParameters( 0.1 )
end
This is how the data and class Tensors look like:
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
[torch.DoubleTensor of size 16x4]
2
2
2
2
1
1
1
1
1
1
1
1
2
2
2
2
[torch.DoubleTensor of size 16x1]
Which is what I expect it to be. However when running this code, i get the following error on line loss = criterion:forward( prediction, class ):
torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:69: attempt to perform arithmetic on a nil value
When i modify the training routine like this (processing a single data point at a time instead of all 16 in a batch) it works and the network successfully learns to recognize the two classes:
for k = 1, 300 do
for i = 1, 16 do
prediction = network:forward( data[i] )
--print( "prediction: " .. tostring( prediction ) )
--print( "class: " .. tostring( class ) )
loss = criterion:forward( prediction, class[i] )
network:zeroGradParameters()
grad = criterion:backward( prediction, class[i] )
network:backward( data[i], grad )
network:updateParameters( 0.1 )
end
end
I'm not sure what might be wrong with the "batch processing" i'm trying to do. A brief look at the ClassNLLCriterion didn't help, it seems i'm giving it the expected input (see below), but it still fails. The input it receives (prediction and class Tensors) looks like this:
-0.9008 -0.5213
-0.8591 -0.5508
-0.9107 -0.5146
-0.8002 -0.5965
-0.9244 -0.5055
-0.8581 -0.5516
-0.9174 -0.5101
-0.8040 -0.5934
-0.9509 -0.4884
-0.8409 -0.5644
-0.8922 -0.5272
-0.7737 -0.6186
-0.9422 -0.4939
-0.8405 -0.5648
-0.9012 -0.5210
-0.7820 -0.6116
[torch.DoubleTensor of size 16x2]
2
2
2
2
1
1
1
1
1
1
1
1
2
2
2
2
[torch.DoubleTensor of size 16x1]
Can someone help me out here? Thanks.
Experience has shown that nn.ClassNLLCriterion
expects target to be a 1D tensor of size batch_size
or a scalar. Your class
is a 2D one (batch_size x 1
) but class[i]
is 1D, that's why your non-batch version works.
So, this will solve your problem:
class = class:view(-1)
Alternatively, you can replace
network:add( nn.LogSoftMax() )
criterion = nn.ClassNLLCriterion()
with the equivalent:
criterion = nn.CrossEntropyCriterion()
The interesting thing is that nn.CrossEntropyCriterion
is also able to take a 2D tensor. Why is nn.ClassNLLCriterion
not?