machine-learning neural-network deep-learning training-data batch-normalization

Why using batch to predict when applying Batch Normalization is cheating?

In a post on Quora, someone says:

At test time, the layer is supposed to see only one test data point at a time, hence computing the mean / variance along a whole batch is infeasible (and is cheating).

But as long as testing data have not been seen by the network during training isn't it ok to use several testing images?

I mean, our network as been train to predict using batches, so what is the issue with giving it batches?

If someone could explain what informations our network gets from batches that it is not suppose to have that would be great :)

Thank you

Solution

But as long as testing data have not been seen by the network during training isn't it ok to use several testing images ?

First of all, it's ok to use batches for testing. Second, in test mode, batchnorm doesn't compute the mean or variance for the test batch. It takes the mean and variance it already has (let's call them mu and sigma**2), which are computed based solely on the training data. The result of batch norm in test mode is that all tensors x are normalized to (x - mu) / sigma.

At test time, the layer is supposed to see only one test data point at a time, hence computing the mean / variance along a whole batch is infeasible (and is cheating)

I just skimmed through Quora discussion, may be this quote has a different context. But taken on its own, it's just wrong. No matter what the batch is, all tensors will go through the same transformation, because mu and sigma are not changed during testing, just like all other variables. So there's no cheating there.