neural-network deep-learning caffe gradient-descent imagenet

how to choose batch size in caffe

I understand that bigger batch size gives more accurate results from here. But I'm not sure which batch size is "good enough". I guess bigger batch sizes will always be better but it seems like at a certain point you will only get a slight improvement in accuracy for every increase in batch size. Is there a heuristic or a rule of thumb on finding the optimal batch size?

Currently, I have 40000 training data and 10000 test data. My batch size is the default which is 256 for training and 50 for the test. I am using NVIDIA GTX 1080 which has 8Gigs of memory.

Solution

Test-time batch size does not affect accuracy, you should set it to be the largest you can fit into memory so that validation step will take shorter time.

As for train-time batch size, you are right that larger batches yield more stable training. However, having larger batches will slow training significantly. Moreover, you will have less backprop updates per epoch. So you do not want to have batch size too large. Using default values is usually a good strategy.