Search code examples
deep-learningartificial-intelligenceface-recognition

What is the efficient way to get compressed ML models in multiple sizes of one model for non ML expert?


I'm not an ML expert and know just a little background about it. I know that there are techniques to reduce the size of neural networks like distillation and pruning. But I don't know how to efficiently perform those techniques.

Now I need to solve a quite practical problem. I'd like to ship FaceNet, the face recognition model, to mobile devices. There might be trade-offs between recognition accuracy and performance + size. I don't know which size of model would fit best my demands. I think I need to test models in many sizes and figure out which one is the best empirically. To do this, I should acquire models in many sizes that are obtained by compressing the pre-trained FaceNet on its website. For example, 30Mb version of FaceNet, 40Mb version of FaceNet, etc.

However, model compression is not free and costs money. I'm worried that I'd do something very stupid and expensive. What is the recommended way to do this? Does this problem have any common solution for not ML expert?


Solution

  • Network "compression" is not like a slider in which you can move anywhere from slow/accurate to fast/not-accurate.

    From your starting model you can apply some techniques which may or may not reduce the size of the network and may or may not reduce also the accuracy. For example converting weights from float32 to float16 will for sure reduce the memory requirement of the network by half, but can also introduce a small reduction in accuracy and it's not supported on all devices.

    My suggestion is: start with the base model and perform some tests with it. Understand how far you are from the target FPS and size of your application and then decide which approach makes more sense to reach the objective you have in mind.

    Since you said you're not an ML expert I think this kind of task (at least to my knowledge) requires some amount of studying of the subject and experimentation and I would start with an open source solution like for example https://tvm.apache.org/.