python tensorflow tensorflow2.0 tensorflow-lite quantization

Int8 quantization of a LSTM model. No matter which version, I run into issues

I want to use a generator to quantize a LSTM model.

Questions

I start with the question as this is quite a long post. I actually want to know if you have manged to quantize (int8) a LSTM model with post training quantization.

I tried it different TF versions but always bumped into an error. Below are some of my tries. Maybe you see an error I made or have a suggestion. Thanks

Working Part

The input is expected as (batch,1,45). Running inference with the un-quantized model runs fine. The model and csv can be found here:
csv file: https://mega.nz/file/5FciFDaR#Ev33Ij124vUmOF02jWLu0azxZs-Yahyp6PPGOqr8tok
modelfile: https://mega.nz/file/UAMgUBQA#oK-E0LjZ2YfShPlhHN3uKg8t7bALc2VAONpFirwbmys

import tensorflow as tf
import numpy as np
import pathlib as path
import pandas as pd  

def reshape_for_Lstm(data):    
    timesteps=1
    samples=int(np.floor(data.shape[0]/timesteps))
    data=data.reshape((samples,timesteps,data.shape[1]))   #samples, timesteps, sensors     
    return data

if __name__ == '__main__':

#GET DATA
    import pandas as pd
    data=pd.read_csv('./test_x_data_OOP3.csv', index_col=[0])
    data=np.array(data)
    data=reshape_for_Lstm(data)  
    
#LOAD MODEL
    saved_model_dir= path.Path.cwd() / 'model' / 'singnature_model_tf_2.7.0-dev20210914'    
    model=tf.keras.models.load_model(saved_model_dir)

# INFERENCE
    [yhat,yclass] = model.predict(data)    
    Yclass=[np.argmax(yclass[i],0) for i in range(len(yclass))] # get final class
    
    print('all good')

The shape and dtypes of the variable data are (20000,1,45), float64

Where it goes wrong

Now I want to quantize the model. But depending on the TensorFlow version I run into different errors.

The code options I use are merged as follows:

    converter=tf.lite.TFLiteConverter.from_saved_model('./model/singnature_model_tf_2.7.0-dev20210914')
    converter.representative_dataset = batch_generator
    converter.optimizations = [tf.lite.Optimize.DEFAULT]         

    converter.experimental_new_converter = False  
   
    #converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] 
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8, tf.lite.OpsSet.TFLITE_BUILTINS]
    #converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
    
    #converter._experimental_lower_tensor_list_ops = False

    converter.target_spec.supported_types = [tf.int8]
    quantized_tflite_model = converter.convert()

TensorFlow 2.2

Using TF 2.2 as often suggested in Git, I run into non-supported operators from tflite. Using a tf2.2 created model to assure version-support. Here, only TOCO conversion is supported.

Some of the operators in the model are not supported by the standard TensorFlow Lite runtime and are not recognized by TensorFlow.

The error does not depend on converter.target_spec.supported_ops options. I could not find a solution therefore. allow_custom_ops only shifts the problem. There are quite some git issues(just some examples) on this out there, but all suggested options did not work.
One is to try the new MILR converter, however, in 2.2 the integer only conversion for MILR was not done yet.

So lets try a newer version

TensorFlow 2.5.0

Then I tried a well vetted version. Here, no matter the converter.target_spec.supported_ops I run in following error using the MLIR conversion:

in the calibrator.py

ValueError: Failed to parse the model: pybind11::init(): factory function returned nullptr.

The solution on Git is to use TF==2.2.0 version.

With TOCO conversion, I get the following error:

tensorflow/lite/toco/allocate_transient_arrays.cc:181] An array, StatefulPartitionedCall/StatefulPartitionedCall/model/lstm/TensorArrayUnstack/TensorListFromTensor, still does not have a known data type after all graph transformations have run. Fatal Python error: Aborted

I did not find anything on this error. Maybe it is solved in 2.6

TensorFlow 2.6.0

Here, no matter which converter.target_spec.supported_ops I use, I run into the following error:

ValueError: Failed to parse the model: Only models with a single subgraph are supported, model had 5 subgraphs.

The model is a five layer model. So it seems that each layer is seen as a subgraph. I did not find an answer on how to merge them into one subgraph. The issue is apparently with 2.6.0 and is solved in 2.7 So, let's try the nightly build.

TensorFlow 2.7-nightly (tried 2.7.0-dev20210914 and 2.7.0-dev20210921)

Here we have to use Python 3.7 as 3.6 is no longer supported

Here we have to use

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]

However, even it is stated that

converter._experimental_lower_tensor_list_ops = False

should be set, it does not seem necessary.

The problem here is that, to my knowledge, tf.lite.OpsSet.SELECT_TF_OPS calls the calibrator.py. In the calibrator.py the representative_dataset is expecting specific generator data. From line 93 onwards in the _feed_tensor() function the generator wants either a dict, list or tuple. In the tf.lite.RepresentativeDataset function description or tflite class description, it states that the dataset should look the same as the input for the model. Which in my case (most cases) is just an numpy array in the correct dimensions.

Here I could try to convert my data into a tuple, however, this does not seem right. Or is that actually the way to go?

Thanks so much for reading all this. If I find an answer, I will of course update the post

Solution

I have the same problem as you, and I'm still trying to solve it, but I noticed a couple of differences in our codes, so sharing it could be useful.

I'm using TF 2.7.0 and the conversion works fine when using:

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS, tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

Anyway, as far as I know, using these options (as you mentioned) is not guaranteeing you the full quantization of the model; so it's likely that you'll not be able to deploy it completely on microcontrollers or TPU systems as the Google Coral.

When using the conversion options recommended by the official guide for the complete quantization:

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

The conversion fails.

I recently succeeded in solving the problem! There is an extra line of code to add when configuring the converter:

converter.target_spec.supported_types = [tf.int8]

Here is the link to the tutorial I followed: https://colab.research.google.com/github/google-coral/tutorials/blob/master/train_lstm_timeseries_ptq_tf2.ipynb#scrollTo=EBRDh9SZVBX1