I have 2 Tensorflow Lite models (they are Yolo V2 Tiny models):
However both on a mobile phone and on a computer model B takes 10x more time to predict than model A (even if model B detects within less classes and its file is lighter). Also, models seem to work with input images of size 416x416 and use float numbers.
What could be the reason for model A being faster than model B? How can I find out why model A is faster?
One of the problems I have is that for model A, since I have not trained it myself, I don't have its .cfg file with the whole setup...
You should try the following two approaches to gain more insight, as the reasons to why a model happens to be slower than expected could be several.
Inspect both networks with a tool like Netron. You can upload your flatbuffer (TF Lite) model file and visualize the network architecture after TF Lite conversion. There you can see where the difference between the two models lies. If e.g. there happen to be additional Reshape operations or alike in Model B compared to A, that could likely be a reason. To download Netron follow https://github.com/lutzroeder/netron.
Measure the time spent by the model on each of its layers. For this you can use the TF Lite benchmark tool provided directly in the Tensorflow repository. Check it out here https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/benchmark/README.md.