Reproducing arithmetic with pytorch's quantized tensors with numpy operations

I would like to know what exact arithmetic operations I have to do to reproduce results of quantized operations in pytorch.

This is almost duplicate question to: I want to use Numpy to simulate the inference process of a quantized MobileNet V2 network, but the outcome is different with pytorch realized one

But I would even simplify it with the example of adding two quantized tensors. For example for addition of two quantized tensors in Resnet architecture I use nn.quantized.FloatFunctional().

self.skip_add = nn.quantized.FloatFunctional()

And during inference I can add two tensors via

out1 = self.skip_add.add(x1, x2)

where x1 and x2 are tensors of torch.Tensor type, quantized with fbgemm backend during post training quantization procedure.

I expected out2_int = x1.int_repr() + x2.int_repr() should be the same as out1.int_repr() (with probably need of clamping in the needed range). However that is not the case. Below I dump the example outputs.

So I wonder how can I get out1 with integer operations?

>print(x1)

      ...,
      [-0.0596, -0.0496, -0.1390,  ..., -0.0596, -0.0695, -0.0099],
      [-0.0893,  0.0000, -0.0695,  ...,  0.0596, -0.0893, -0.0298],
      [-0.1092,  0.0099,  0.0000,  ..., -0.0397, -0.0794, -0.0199]]]],
   size=(1, 256, 14, 14), dtype=torch.quint8,
   quantization_scheme=torch.per_tensor_affine, scale=0.009925744496285915,
   zero_point=75)

print(x2)

      ...,
      [ 0.1390, -0.1669, -0.0278,  ..., -0.2225, -0.0556, -0.1112],
      [ 0.0000, -0.1669, -0.0556,  ...,  0.0556,  0.1112, -0.2781],
      [ 0.1390,  0.1669,  0.0278,  ...,  0.2225,  0.4171,  0.0834]]]],
   size=(1, 256, 14, 14), dtype=torch.quint8,
   quantization_scheme=torch.per_tensor_affine, scale=0.02780967578291893,
   zero_point=61)

print(x1.int_repr())

      ...,
      [69, 70, 61,  ..., 69, 68, 74],
      [66, 75, 68,  ..., 81, 66, 72],
      [64, 76, 75,  ..., 71, 67, 73]]]], dtype=torch.uint8)

print(x2.int_repr())

      ...,
      [66, 55, 60,  ..., 53, 59, 57],
      [61, 55, 59,  ..., 63, 65, 51],
      [66, 67, 62,  ..., 69, 76, 64]]]], dtype=torch.uint8)

print(out1)

      ...,
      [ 0.0904, -0.2109, -0.1808,  ..., -0.2712, -0.1205, -0.1205],
      [-0.0904, -0.1808, -0.1205,  ...,  0.1205,  0.0301, -0.3013],
      [ 0.0301,  0.1808,  0.0301,  ...,  0.1808,  0.3314,  0.0603]]]],
   size=(1, 256, 14, 14), dtype=torch.quint8,
   quantization_scheme=torch.per_tensor_affine, scale=0.03012925386428833,
   zero_point=56)

print(out1.int_repr())

      ...,
      [59, 49, 50,  ..., 47, 52, 52],
      [53, 50, 52,  ..., 60, 57, 46],
      [57, 62, 57,  ..., 62, 67, 58]]]], dtype=torch.uint8)

 print(out2_int)

      [135, 125, 121,  ..., 122, 127, 131],
      [127, 130, 127,  ..., 144, 131, 123],
      [130, 143, 137,  ..., 140, 143, 137]]]], dtype=torch.uint8)

Solution

The answer is twofold:

Integer operations are implemented taking into account that int8 number refer to different domain. Convolution (or matrix-matrix multiplication in general) is implemented with respect to this fact and my answer here I want to use Numpy to simulate the inference process of a quantized MobileNet V2 network, but the outcome is different with pytorch realized one worked for me.
Addition in pytorch is implemented in floats. You need to convert from int to float, make an addition and then convert back to int.

def manual_addition(xq1_int, scale1, zp1, xq2_int, scale2, zp2,
scale_r, zp_r):

  xdq = scale1 * (xq1_int.astype(np.float) - zp1)
  ydq = scale2 * (xq2_int.astype(np.float) - zp2)
  zdq = xdq + ydq
  zq_manual_int = (((zdq / scale_r).round()) + zp_r).round() 
  return zq_manual_int #clipping might be needed