How to interpret the file mean.binaryproto when loading a Neural Network?

I want to load a Neural Network that has been trained with caffe for image classification.

The NN contains a file mean.binaryproto which has the means to be subtracted before inputting an image to be classified.

I am trying to understand what is contained in this file so I used Google Colab to see what is inside it.

The code to load it is the following:

# Load the Drive helper and mount
from google.colab import drive

# This will prompt for authorization.
drive.mount('/content/drive')
!ls "/content/drive/My Drive"

#install packages
!apt install -y caffe-cuda
!apt update
!apt upgrade
!apt dist-upgrade
!ls "/content/drive/My Drive/NeuralNetwork/CNRPark-Trained-Models/mAlexNet-on-CNRPark/"
import caffe
import numpy as np
with open('/content/drive/My Drive/NeuralNetwork/CNRPark-Trained-Models/mAlexNet-on-CNRPark/mean.binaryproto', 'rb') as f:
    blob = caffe.proto.caffe_pb2.BlobProto()
    blob.ParseFromString(f.read())
    arr = np.array( caffe.io.blobproto_to_array(blob) )
    print(arr.shape)
    out = arr[0]
    data = np.array(blob.data).reshape([blob.channels, blob.height, blob.width])
    print (data.shape)
    print(data[0])
 #display the mean image
 from PIL import Image
 from IPython.display import Image as Im, display
 display(Image.fromarray(data[0], 'RGB'))

which outputs:

(1, 3, 256, 256)
(3, 256, 256)

What I have understood is that the file contain the means and the images we are talking about are 3 channel images so there is a mean for each channel.

However I was expecting a single value per channel instead I found a 256x256 array: does it mean that a mean on each pixel of each channel has been taken?

Another question is the following: I want to use such NN with OpenCV which instead of RGB uses BGR: How to know if the mean 3x256x256 uses RGB or BGR?

The link to the model is this. The model I am looking at is contained in the zip file CNRPark-Trained-Models.zip within the folder: mAlexNet-on-CNRPark.

Solution

However I was expecting a single value per channel instead I found a 256x256 array: does it mean that the took a mean on each pixel of each channel?

Exactly. According to the shape of mean.binaryproto, this file is the average image of some dataset, which means that it took the mean of each pixel (feature) for each channel.

This should not be confused with the mean pixel, which, as you stated, is a single value for each channel.

For example, mean pixel was adoped by Very Deep Convolutional Networks for Large-Scale Image Recognition. According to their paper:

The only pre-processing we do is subtracting the mean RGB value, computed on the training set, from each pixel

In other words, if you consider an RGB image to be 3 feature arrays of size N x N, the average image will be the mean of each feature and the mean pixel will be the mean of all features.

Another question is the following: I want to use such NN with OpenCV which instead of RGB uses BGR: How to know if the mean 3x256x256 uses RGB or BGR?

I doubt the binary file you are reading stores any information about its color format, but a practical way to figure out is to plot this image using matplotlib and see if the colors make sense.

For example, face images. If red and blue channels are swapped the skin tone will look blueish.

In fact, the image above is an example of average image (face images) :)

You could also assume it is BGR since OpenCV uses this color format.

However, the correct way to find out how this mean.binaryproto was generated is by looking at their repositories or by asking the owner of the model.