Why SNPE SDK is very Slow?

I tried example provided by Qualcomm here:

https://github.com/globaledgesoft/deeplabv3-application-using-neural-processing-sdk

https://github.com/globaledgesoft/deeplabv3-application-using-neural-processing-sdk/blob/master/AndroidApplication/app/src/main/java/com/qdn/segmentation/tasks/SegmentImageTask.java

It says it should take 31ms on GPU16 for this piece of code to complete:

// [31ms on GPU16, 50ms on GPU] execute the inference

            outputs = mNeuralnetwork.execute(mInputTensorsMap);

For me the same example takes 14 seconds. I am using open-q 845 hdk development kit.

I asked my Professor and he said that the app I am installing is not trusted by the development kit firmware that is why I takes so much time to execute. He suggested me to rebuild firmware with my app installed as System app. What other reasons could be there?

Solution

yes this is very confusing, I ran to the same problem. What I noticed is that on my device at least (Snapdragon 835) ResizeBilinear_2 and ArgMax takes an insane amount of time. If you disable CPU fallback you will see that ResizeBilinear_2 is actually not supported since in the deeplab implementation they used align_corner=true.

If you pick ResizeBilinear_1 as the output layer there will be a significant improvement to the inference time with the trade off of you not having the bilinear resize layer and argmax which you will have to implement yourself.

But even then using the gpu I was only able to reach around 200 ms runtime. With the DSP I did manage to get around 100 ms.

Also be sure that your kit has opencl support in it, otherwise gpu runtime won't work afaik.

Side Note: I'm currently still testing stuff with deeplab + snpe as well. I noticed that comparing this and TFLITE gpu delegate theres some differences in the output. While SNPE in general is about twice as fast theres a lot of segmentation artifacts errors which can result in unusable model. Check this out https://developer.qualcomm.com/forum/qdn-forums/software/snapdragon-neural-processing-engine-sdk/34844

What I found out so far is that if you drop the output stride to 16 not only will you get double the inference speed, said artifacts seems to be less visible. Of course you lose some accuracy doing so. Good Luck!