Unable to load quantized pytorch mobile model on android

I am struggling to get my quantized pytorch mobile model (custom MobilenetV3) running on android. I have followed this tutorial https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html and I managed to quantize my model without any problems. However, when I try to load the model via. module = Module.load(assetFilePath(this, MODEL_NAME)); I get the following Exception:

Unknown builtin op: quantized::linear_unpack_fp16.

Could not find any similar ops to quantized::linear_unpack_fp16. This op may not exist or may not be currently supported in TorchScript.

Why are there even float16 values in the quantized model, I thought quantization would replace all float32 vlaues with qint8/quint8? Any ideas on how this can be fixed?

This is how the pytorch model was quantized and saved:

torch.quantization.get_default_qconfig(backend='qnnpack')
model.qconfig = torch.quantization.default_qconfig
torch.quantization.prepare(model, inplace=True)
torch.quantization.convert(model, inplace=True)
traced_script_module = torch.jit.trace(model,dummyInput)
traced_script_module.save("model/modelQuantized.pt")

Solution

I found that this error was caused by a version mismatch between the pytorch version that was used for model quantization and pytorch_android.

The model was quantized with pytorch 1.5.1 torchvision 0.6.1 cudatoolkit 10.2.89 but I used org.pytorch:pytorch_android:1.4.0 for building.

Switching to org.pytorch:pytorch_android:1.5.0 solved it.