pytorch tensorflow-lite onnx quantization quantization-aware-training

network quantization——Why do we need "zero_point"? Why symmetric quantization doesn't need "zero point"?

I have Googled all the days, but can't still find the answer I need. There must be some misunderstandings in my brain. Could you please help me out?

1. Why do we need "zero_point"?

quantization：q=round(r/scale)+zero_point

I think that the zero_point (as an offset) shifts the scaled data to a proper position, for examle, in the figure below for unsigned 2 bits quantization, the zero point shifts [-1,2] to {0,1,2,3}

Am I right about this?

If I am wrong, please help correct me;

If I am right, then zero point is neceesary here (symmetric quantization), and why Jacob in IAO, Section 2.1 said zero-point is for zero-padding? It seems to me that this is just an outcome, not the root reason ?

2. Why doesn't symmetric quantization need "zero point"?

In Goolge White paper and some blogs, it it said that symmetric quantization dose not need zero point (since zero_point=0):

I can understand it in signed quantization, since both the floating range and the quantized range are symmetric, making zero_point=0.

However, how can we ignore zero_point in unsigned quantization where the quantized range [0,2^b-1] is not symmetric? Under this situation, it seems to me that we must have a positive zero point to shift the scaled data to the range [0, 2^b-1] as the figure below:

Solution

I haven't found or got definitive answe yet.

But I convince myself as follow:

1.Why do we need "zero_point"?

The zero_point is definitely an offset or bias, shifting the scaled data to a proper position. There should be no doubts about it.

But what "the motivation" that Jocab methioned is that "Z is of the same tye as quantized q", instead of "having a zero point". The former makes sure the real "0" is quantized without error, thus when inferring in the quantized manner, it is the zero point (the same type as q) that is padded (instead of the value "0") in zero_padding (wihout error).

Why doesn't symmetric quantization need "zero point"?

I think the "if signed" and "if un-signed" in the formulas (7)-(9) of white paper are talking about the signs of x, i.e., the real, unquantized, floating-point values, instead of the quantized one. This means, signed floating-point values are quantized to signed fixed-point integer, with zero-point=0; and unsigned floating-point to unsigned fixed-point interger, with zero-point=0 as well.