Search code examples
numpypytorchproductionveracode

Minimal (light version) PyTorch and Numpy packages in production


I am putting a model into production and I am required to scan all dependencies (Pytorch and Numpy) beforehand via VeraCode Scan.

I noticed that the majority of the flaws are coming from test scripts and caffe2 modules in Pytorch and numpy.

Is there any way to build/install only part of these packages that I use in my application? (e.g. I won't use testing and caffe2 in the application so there's no need to have them in my PyTorch / Numpy source code)


Solution

  • 1. PyInstaller

    You could package your application using pyinstaller. This tool packages your app with Python and dependencies and use only the parts you need (simplifying, in reality it's hard to trace your package exactly so some other stuff would be bundled as well).

    Also you might be in for some quirks and workarounds to make it work with pytorch and numpy as those dependencies are quite heavy (especially pytorch).

    2. Use only PyTorch

    numpy and pytorch are pretty similar feature-wise (as PyTorch tries to be compatible with it) hence maybe you could only use only of them which would simplify the whole thing further

    3. Use C++

    Depending on other parts of your app you may write it (at least neural network) in C++ using PyTorch's C++ frontend which is stable since 1.5.0 release.

    Going this route would allow you to compile PyTorch's .cpp source code statically (so all dependencies are linked) which allows you for relatively small binary size (30Mb when compared to PyTorch's 1GB+), but requires a lot of work.