Search code examples
matlabdeep-learningobject-detectionmatconvnet

How the deep network accepts images of different scales in object detection?


The network built with MatConvNet accepts images of different scales and evaluates it. For example :-

%img is an image of size 730*860*3
%net is loaded DagNN obj
scales = [-2 -1 0 0.5 1]
for s = 2.^scales
    img = imresize(raw_img, s, 'bilinear');
    img = bsxfun(@minus, img, averageImage);
    inputs = {'data', img};
    net.eval(inputs);
end

At the time of debugging, I found img resized and evaluated every iteration of the loop. But the network(net) was supposed to accept fixed image. As -

K>> net

net = 

  DagNN with properties:

                 layers: [1x319 struct]
                   vars: [1x323 struct]
                 params: [1x381 struct]
                   meta: [1x1 struct]
                      m: []
                      v: []
                   mode: 'test'
                 holdOn: 0
    accumulateParamDers: 0
         conserveMemory: 1
        parameterServer: []
                 device: 'cpu'

After loading trained network :-

K>> net.vars(1, 1).value

ans =

     []

And inside the for loop :-(iter 1)

K>> net.vars(1, 1).value

ans =

     [64 64 3]

(iter 2)

K>> net.vars(1, 1).value

ans =

     [160 160 3]

and so on.... So how the DagNN is handling such input and evaluates itself?(I am new to MatConvNet and couldn't find any help in the documentation. So please answer this question and suggest how to build such things in keras)


Solution

  • In general, ConvNet does not care about the input size of an image. All the layers are performing convolution-like operations (e.g, even the poolings behave like convolution spatially). If you provide large input, you get large output. The only thing that cares about the input size is the loss layer. If you don't have a loss layer, the code wouldn't break at all. There is no such thing as fully connected layer in MatConvNet, everything is convolutional.

    BTW, that's why some people who work ConvNet early think that FCN is a funny name, because there is really no difference between a fully connected layer and a convolutional layer.