I want to understand the basic operation done in a convolution layer of a quantized model in TensorflowLite.
As a baseline, I chose a pretrained Tensorflow model, EfficientNet-lite0-int8 and used a sample image to serve as input for model's inference. Thereinafter, I managed to extract the output tensor of the first fused ReLU6 Convolution Layer and compared this output with that of my custom python implementation on this.
The deviation between the two tensors was large and something that I cannot explain is that Tensorflow's output tensor was not between the range of [0,6] as expected (I expected that because of the fused ReLU6 layer in the Conv layer).
Could you please provide me with a more detailed description of a quantized fused Relu6 Conv2D layer's operation in TensorflowLite?
After, studying carefully Tensorflow's github repository I found kernel_util.cc file and CalculateActivationRangeUint8 function. So using this function, I managed to understand why quantized fused ReLu6 Conv2D layer's output tensor is not clipped between [0, 6] but between [-128, 127] values. For the record, I managed to implement a Conv2D layer's operation in Python with some simple steps.