python tensorflow keras quantization-aware-training

Quantization Aware Training for Tensorflow Keras model

I want to quantization-aware train with my keras model. I have tried like below. I'm using tensorflow 1.14.0

train_graph = tf.Graph()
train_sess = tf.compat.v1.Session(graph=train_graph)
tf.compat.v1.keras.backend.set_session(train_sess)

with train_graph.as_default():
    tf.keras.backend.set_learning_phase(1)
    model = my_keras_model()

    tf.contrib.quantize.create_training_graph(input_graph = train_graph, quant_delay=5)
    train_sess.run(tf.global_variables_initializer()) 

    model.compile(...)
    model.fit_generator(...)

    saver = tf.compat.v1.train.Saver()  
    saver.save(train_sess, checkpoint_path)

It works without errors.

However, size of saved model(h5 and ckpt) is completely same as the model without quantization.

Is it the right way? How I can check whether it is quantized well?

Or, is there better way to quantize?

Solution

When you finish the quantization-aware-training and save your model to disk, it is actually not already quantized. In other words, it is "prepared" for quantization, but the weights are still float32. You have to further convert your model to TFLite for it to actually be quantized. You can do so with the following piece of code:

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

quantized_tflite_model = converter.convert()

This will quantize your model with int8 weights and uint8 activations.

Have a look at the official example for further reference.