I'm currently doing semantic segmentation ,However I have really small dataset,
I only have around 700 images with data augmentation,for example,flipping could
make it 2100 images.
Not sure if it's quite enough for my task(semantic segmentation with four
classes).
I want to use batch normalization,and mini batch gradient descent
What's really make me scratch my head is that if the batch size is too small,
the batch normalization doesn't work well ,but with larger batch size,
it seems equivalent to full batch gradient descent
I wonder if there's something like standard ratio between #of samples and batch size?
Let me first address the second part of your question "strategy for neural network with small dataset". You may want to take a pretrained network on a larger dataset, and fine tune that network using your smaller dataset. See, for example, this tutorial.
Second, you ask about the size of the batch. Indeed, the smaller batch will make the algorithm to wander around the optimum as in classical stochastic gradient descent, the sign of which is noisy fluctuations of your losses. Whereas with a larger batch size there is typically a more "smooth" trajectory towards optimum. In any case, I suggest that you use an algorithm with momentum such as Adam. That would aid the convergence of your training.
Heuristically, the batch size can be kept as large as your GPU memory can fit. If the amount of GPU memory is not sufficient, then the batch size is reduced.