ios tensorflow deep-learning gpu quantization

Use .tflite with iOS and GPU

I have created a new tflite model based on MobilenetV2. It works well without quantization using CPU on iOS. I should say that TensorFlow team did a great job, many thanks.

Unfortunately there is a problem with latency. I use iPhone5s to test my model, so I have the following results for CPU:

500ms for MobilenetV2 with 224*224 input image.
250-300ms for MobilenetV2 with 160*160 input image.

I used the following pod 'TensorFlowLite', '~> 1.13.1'

It's not enough, so I have read TF documentation related to optimization (post trainig quantization). I suppose I need to use Float16 or UInt8 quantization and GPU Delegate (see https://www.tensorflow.org/lite/performance/post_training_quantization). I used Tensorflow v2.1.0 to train and quantize my models.

Float16 quantization of weights (I used MobilenetV2 model after Float16 quantization)

https://github.com/tensorflow/examples/tree/master/lite/examples/image_segmentation/ios

pod 'TensorFlowLiteSwift', '0.0.1-nightly'

No errors, but model doesn’t work

pod 'TensorFlowLiteSwift', '2.1.0'

2020-05-01 21:36:13.578369+0300 TFL Segmentation[6367:330410] Initialized TensorFlow Lite runtime. 2020-05-01 21:36:20.877393+0300 TFL Segmentation[6367:330397] Execution of the command buffer was aborted due to an error during execution. Caused GPU Hang Error (IOAF code 3)

Full integer quantization of weights and activations

pod ‘TensorFlowLiteGpuExperimental’

Code sample: https://github.com/makeml-app/MakeML-Nails/tree/master/Segmentation%20Nails

I used a MobilenetV2 model after uint8 quantization.

GpuDelegateOptions options;
    options.allow_precision_loss = true;
    options.wait_type = GpuDelegateOptions::WaitType::kActive;

    //delegate = NewGpuDelegate(nullptr);
    delegate = NewGpuDelegate(&options);

    if (interpreter->ModifyGraphWithDelegate(delegate) != kTfLiteOk)

Segmentation Live[6411:331887] [DYMTLInitPlatform] platform initialization successful Loaded model 1resolved reporterDidn't find op for builtin opcode 'PAD' version '2'

Is it possible to use MObilenetV2 quantized model on IOS somehow? Hopefully I did some mistake :) and it's possible.

Best regards, Dmitriy

Solution

sorry for outdated documentation - the GPU delegate should be included in the TensorFlowLiteSwift 2.1.0. However, looks like you're using C API, so depending on TensorFlowLiteC would be sufficient.

MobileNetV2 do work with TFLite runtime in iOS, and if I recall correctly it doesn't have PAD op. Can you attach your model file? With the information provided it's a bit hard to see what's causing the error. As a sanity check, you can get quant/non-quant version of MobileNetV2 from here: https://www.tensorflow.org/lite/guide/hosted_models

For int8 quantized model - afaik GPU delegate only works for FP32 and (possibly) FP16 inputs.