Does TensorFlow support different bit-width quantization between layers, or is it mandatory that the same technique is performed at the whole model?
For example, let's say I perform 16-bit
quantization at n
layer. Can I perform 8-bit
quantization at n+1
layer?
No, as of now there is no option to define the different dtype
for the different layers of a model.
As per the documentation of tf.keras.layers.Layer
. This is the class from which all layers inherit.
dtype - The dtype of the layer's computations and weights (default of None means use tf.keras.backend.floatx in TensorFlow 2, or the type of the first input in TensorFlow 1).