python tensorflow tensorflow-lite quantization

Tensorflow Quantization - Failed to parse the model: pybind11::init(): factory function returned nullptr

I'm working on a TensorFlow model to be deployed on an embedded system. For this purpose, I need to quantize the model to int8. The model is composed of three distinct models:

CNN as a feature extractor
TCN for temporal prediction
FC/Dense as last classfier.

I implemented the TCN starting from this post with some modifications. In essence, the TCN is just a set of 1D convolutions (with some 0-padding) plus an add operation.

## Define TCN newer
tcn_input = tf.keras.Input(shape=tf.keras.backend.int_shape(glue)[1:])
# first causal conv for channel adaptation
k=1; d=1; padding = (k - 1) * d
# tcn_input_p = tf.pad(tcn_input, tf.constant([(0,0), (1,0), (0,0)]) * padding)
temp_block_input = tf.keras.layers.Conv1D(32,k, padding='valid', data_format='channels_last', name='adapt_conv')(tcn_input)

# TEMPORAL BLOCK 1
k=2; d=1; padding = (k - 1) * d
# temp_block_input_p = tf.pad(temp_block_input, tf.constant([(0,0), (1,0), (0,0)]) * padding)
temp_block_input_p = tf.keras.layers.ZeroPadding1D((padding, 0))(temp_block_input)
x = tf.keras.layers.Conv1D(32,k, padding='valid', data_format='channels_last', dilation_rate=d, activation='relu', name='conv1')(temp_block_input_p)
temp_block_input = tf.keras.layers.Add()([temp_block_input, x])

# TEMPORAL BLOCK 2
k=2; d=2; padding = (k - 1) * d
# temp_block_input_p = tf.pad(temp_block_input, tf.constant([(0,0), (1,0), (0,0)]) * padding)
temp_block_input_p = tf.keras.layers.ZeroPadding1D((padding, 0))(temp_block_input)
x = tf.keras.layers.Conv1D(32,k, padding='valid', data_format='channels_last', dilation_rate=d, activation='relu', name='conv2')(temp_block_input_p)
temp_block_input = tf.keras.layers.Add()([temp_block_input, x])

# TEMPORAL BLOCK 3
k=2; d=4; padding = (k - 1) * d
# temp_block_input_p = tf.pad(temp_block_input, tf.constant([(0,0), (1,0), (0,0)]) * padding)
temp_block_input_p = tf.keras.layers.ZeroPadding1D((padding, 0))(temp_block_input)
x = tf.keras.layers.Conv1D(32,k, padding='valid', data_format='channels_last', dilation_rate=d, activation='relu', name='conv3')(temp_block_input_p)
x = tf.keras.layers.Add()([temp_block_input, x])

tcn = tf.keras.Model(tcn_input, x, name='tcn')

tcn.summary()

I try to quantize the TCN with the following code (which works for other models, eg the CNN)

converter = tf.lite.TFLiteConverter.from_keras_model(tcn)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8

def representative_dataset(): # generate the inputs
    for sample in x_train:
        yield [cnn(i) for i in sample]

converter.representative_dataset = representative_dataset
quant_model = converter.convert()

with open(os.path.join('models','tcn_q.bin'), 'wb') as f:
    f.write(quant_model)

And I get the error below. I also unsuccessfully tried the following:

Use the format saved_model and then tf.lite.TFLiteConverter.from_saved_model(path)
use tf.Add and tf.pad instead of the keras API
Remove the Add operation to make the model sequential

Failed to parse the model: pybind11::init(): factory function returned nullptr.

I could not find a solution so far, but I believe it should be possible to quantize this network, as the operations I use are basic and should be supported. I can also use some workaround if anything comes to mind, but I'd like to understand which part is creating the issue.

As a side node, I also inspected the network with netron.app, and it seems the 1D convolutions are transformed into a 2D convolution using some additional Reshape, ExpandDims and BatchToSpace layers. I'm not sure if this might be an issue though.

Solution

As suggested by @JaesungChung, the problem seems to be solved using tf-nightly (I tested on 2.5.0-dev20210325).

It's possible to obtain the same effect in 2.4.0 using a workaround and transforming the Conv1D into Conv2D with a width of 1 and using a flat kernel (1, kernel_size).