Search code examples
luamachine-learningtorchconv-neural-network

torch7: Error in fine-tuning a CNN


I am trying to fine-tune a pre-trained OverFeat conv-net (CNN) on a dataset of images of multi-class faces. My training-testing lua script is based on the tutorial given here.

I first created a script and tried to test my script by training-testing on a subset of the ImageNet dataset. After resolving some issues, it was working as expected without any errors. But, then I made slight changes to the script, like adding few layers in the neural net, changing the input files and labels, to adapt it to the new dataset. My updated model to be fine-tuned is as follows:

   net:add(SpatialConvolution(3, 96, 7, 7, 2, 2))
   net:add(nn.ReLU(true))
   net:add(SpatialMaxPooling(3, 3, 3, 3))
   net:add(SpatialConvolutionMM(96, 256, 7, 7, 1, 1))
   net:add(nn.ReLU(true))
   net:add(SpatialMaxPooling(2, 2, 2, 2))
   net:add(SpatialConvolutionMM(256, 512, 3, 3, 1, 1, 1, 1))
   net:add(nn.ReLU(true))
   net:add(SpatialConvolutionMM(512, 512, 3, 3, 1, 1, 1, 1))
   net:add(nn.ReLU(true))
   net:add(SpatialConvolutionMM(512, 1024, 3, 3, 1, 1, 1, 1))
   net:add(nn.ReLU(true))
   net:add(SpatialConvolutionMM(1024, 1024, 3, 3, 1, 1, 1, 1))
   net:add(nn.ReLU(true))
   net:add(SpatialMaxPooling(3, 3, 3, 3))
   net:add(SpatialConvolutionMM(1024, 4096, 5, 5, 1, 1))
   net:add(nn.ReLU(true))
   net:add(SpatialConvolutionMM(4096, 4096, 1, 1, 1, 1))
   net:add(nn.ReLU(true))
   net:add(SpatialConvolutionMM(4096, 1000, 1, 1, 1, 1))
   -- net:add(nn.View(1000))
   net:add(nn.ReLU(true))
   net:add(SpatialConvolutionMM(1000, 530, 1, 1, 1, 1))
   net:add(nn.View(530))
   net:add(nn.SoftMax())

I am using nn.ClassNLLCriterion() to train my network. But upon training, I am facing the following error:

==> online epoch # 1 [batchSize = 8]    
/home/adarshc/torch/install/bin/luajit: ...shc/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:38: attempt to call method 'type' (a nil value)
stack traceback:
    ...shc/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:38: in function 'forward'
    final_2.lua:486: in function 'opfunc'
    /home/adarshc/torch/install/share/lua/5.1/optim/sgd.lua:43: in function 'optimMethod'
    final_2.lua:509: in function 'train'
    final_2.lua:613: in main chunk
    [C]: in function 'dofile'
    ...rshc/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x00406670

I am unable to resolve it as I'm finding this error non-informative, and thus, difficult to trace and debug. Can someone please help me resolve this issue?

Thanks in advance.


Solution

  • Here when you do a forward on your criterion the second argument (= target) has not the right type, hence this error.

    Since you apparently work with mini-batches you are supposed to pass a 1D torch long tensor of size N = mini-batch size (in non batch mode it could be a number or a single element 1D long tensor).

    Note: at training time, right before the nn.ClassNLLCriterion, you should use a nn.LogSoftMax() and not a nn.SoftMax() layer. As an alternative there is a built-in layer that combines both: nn.CrossEntropyCriterion.