How to multi-thread in ONNX Runtime?

Using ONNX Runtime to run inference on deep learning models. Lets say I have 4 different models, each with its own input image, can I run them in parallel in 4 threads? Would there be one "environment" and then 4 sessions (using same environment)?

Solution

Yes - one environment and 4 separate sessions is how you'd do it.

'read only state' of weights and biases are specific to a model.

A session has a 1:1 relationship with a model, and those sorts of things aren't shared across sessions as you only need one session per model given you can call Run concurrently with different input sizes (assuming the model supports dynamic batch/input sizes).

Regarding threading, the default is a per-session threadpools, but it's also possible to share global threadpools across sessions.

How you do that differs by the API used:

For the C API use CreateEnvWithGlobalThreadPools.
For the C++ API provide OrtThreadingOptions when constructing Ort::Env.