Why more output data using INT8 inference using TensorRT

Implemented INT8 engine inference using TensorRT.

Training batch size is 50 and inference batch size is 1.

But at output inference

[outputs] = common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream, batch_size=1)

Output size is 13680000.

It has to be 273600. Using FP32/FP16 produced output size 273600.

Why there is 5 times more output size using INT8?

My inference code is

with engine.create_execution_context() as context:
      fps_time = time.time()
      inputs, outputs, bindings, stream = common.allocate_buffers(engine)
      im = np.array(frm, dtype=np.float32, order='C')
      #im = im[:,:,::-1]
      inputs[0].host = im.flatten()
      [outputs] = common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream, batch_size=1)
      outputs = outputs.reshape((60, 80, 57))

Solution

It is because of train batch size is 50 and memory is allocated for that batch size.

Need to reshape as outputs = outputs.reshape((50, 60, 80, 57))

Then take the [0] tensor, that is the result when we do inference with one image.