This TensorFlow guide gives some insights on 8 bit representation of the neural network weight and activations. It maps the range from min-max in float32 to 8bit format by mapping min value in float32 to 0 in int8 and max value to 255. This means the addition identity (0) is mapped to non-zero value and even the multiplication identity (1) may be mapped to value other than 1 in the int8 representation. My questions are,
After loosing these identities, how the arithmetic is performed in the new representation? In case of addition/sub, we can get back the approx float32 number after appropriate scaling and offseting.
How to convert the result of multiplication in int8 format to the native float32 format?
There are some more details of the quantization process in practice here: http://www.oreilly.com/data/free/building-mobile-applications-with-tensorflow.csp
We'll be updating the tensorflow.org documentation soon too. To specifically answer #2, you have a new min/max float range for your 32-bit accumulated result which you can use to convert back to floats.