Search code examples
pythondeep-learningcaffeconv-neural-networkpycaffe

Caffe, how to predict from a pretrained net


I'm using this code to load my net:

net = caffe.Classifier(MODEL_FILE, PRETRAINED,
                   mean=np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1),
                   channel_swap=(2,1,0),
                   raw_scale=255,
                   image_dims=(256, 256))

I have doubts on three lines.

1- mean=np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1)

What is mean? Should I use this mean value or another? And if yes, where can I get custom mean value? I'm using a custom dataset.

2- channel_swap=(2,1,0)

What channel_swap means? And again, should I use this value or an custom value?

And the last

3- raw_scale=255

What is raw_scale? And what value should I use?

I'm using Cohn Kanade dataset. All images are 64x64 and in grayscale.


Solution

  • The channel_swap is to reverse RGB into BGR, which is apparently necessary if you use a reference image net model, based on a comment in [1]. In your case the images are greyscale, so you probably do not have three channels. You might need to set it to (0, 0, 0), but even that might not help (I am unsure on the exact implementation of channel_swap). If that does not help, the simplest solution might be to preprocess you data by splitting every pixel into three values (RGB) with equal values. After that you might drop channel_swap altogether, because your channels have the same value, and swapping them is a no-op.

    Mean is what will be subtracted from your input data to center it. (Remember that neural networks need the data to have zero mean, while the input images usually have positive mean, hence the need of the subtraction). The mean you subtract should be the same that was used for training, so using mean from the file associated with the model is correct. I am not sure, however, on whether you should call .mean(1) on it -- did you get that line from some example? If yes, then it is most likely the correct thing to do.

    raw_scale is a scale of your input data. The model expects pixels to be normalized, so if your input data has values between 0 and 255, then raw_scale set to 255 is correct. If your data has values between 0 and 1, then raw_scale should be set to 1.

    Finally, based on my understanding of the comment in [2] you do not need to provide image_dims

    [1] https://github.com/BVLC/caffe/blob/master/python/caffe/io.py#L204

    [2] https://github.com/BVLC/caffe/blob/master/python/caffe/classifier.py#L18