How to correctly accumulate gradients across batches in Torch?

I'd like to accumulate gradients across several batches. Training with iter_size is 2 and batch_size 16 should be the same as if I set iter_size = 1 and batch_size = 32. I suspect there is something I've missed in my code, because gradParams for both cases are not the same. I will be very appreciate if you help me to find out the problem. Here is my code:

   local params, gradParams = net:getParameters()
   local iter_size = 2
   local batch_size = 16
   local iter = 0
   net:zeroGradParameters()
   for i, input, target in trainset:sampleiter(batch_size) do
      iter = iter + 1
      -- forward
      local input = input:cuda()
      local target = target:cuda()
      local output = net:forward(input)
      local loss = criterion:forward(output, target)
      local gradOutput = criterion:backward(output, target)
      local gradInput = net:backward(input, gradOutput)
      -- update
      if iter == iter_size then
          gradParams:mul(1.0/iter_size)
          net:updateGradParameters(0.9)
          net:updateParameters(0.01)
          iter = 0
          net:zeroGradParameters()
      end
   end

It is also worth mentioning that I manually set random seed for determinism when comparing results, so the difference is not due to random initialization of the network.

Solution

The problem was due to sampling, sampleiter returned images in different order for different batch sizes, so batches in these two cases contained different images and thus accumulated gradients were different.