It seems post-training quantization works for some model structures and not for others. For example, when I run my code with
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10)
])
# Train the digit classification model
model.compile(optimizer='adam',
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=2)
and the post-training quantization as
converter = tf.lite.TFLiteConverter.from_keras_model(model)
# This enables quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.int8]
# This ensures that if any ops can't be quantized, the converter throws an error
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# These set the input and output tensors to uint8 (added in r2.3)
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
# And this sets the representative dataset so we can quantize the activations
converter.representative_dataset = representative_data_gen
tflite_model = converter.convert()
with open('my_mnist_quant.tflite', 'wb') as f:
f.write(tflite_model)
the ! edgetpu_compiler my_mnist_quant.tflite
command works perfectly fine and creates a tflite model that has comparable performance to the one I trained on the server.
However, when I changed the model to
model = keras.Sequential([
keras.layers.InputLayer(input_shape=(28, 28)),
keras.layers.Reshape(target_shape=(28, 28, 1)),
keras.layers.Conv2D(64, kernel_size=3, activation='relu', input_shape=(28,28,1)),
keras.layers.Flatten(),
keras.layers.Dense(10, activation='softmax')
])
all I did was add a convolutional layer and some reshaping layers to my model. However, when I ran quantization with this model, everything ran fine until I tried to compile it with the edgetpu_compiler. In this case, the edgetpu_compiler complains and says that my model was not quantized, even though I ran the same exact code as the first model.
Can anyone explain to me why this sort of error occurs? Can a model not be quantized when it is a different structure?
If you are using tf-nightly or a newer version, the MLIR converter may be on, which isn't supported by the compiler yet. This may causes some weird error, if you try turning that off by adding:
converter.experimental_new_converter = False
that may just be the cause of your issue!