python opencv object-detection python-multithreading

Why does python threading slow down inference time for faster R-CNN

I am working on a traffic tracking system which analyses video that's already been collected. I am using opencv, threading, pytorch, and dectron2. To speed up frame grabbing from opencv I decided to use a Thread which runs a loop filling a Queue with frames as seen in this blog post. After implementing this I can access frames as fast as the rest of my processing pipeline can go, so no problems there. The problem occurs when I do inference (just a forward pass through Faster R-CNN model) on frames now which takes 5+ seconds compared to 0.11s in the past. My GPU is being used and my CPU is under utilised by a long shot. What could cause this to happen?

Solution

CPython has a Global Interpreter Lock. This means the interpreter has a Big Lock which prevents evaluating Python bytecode from multiple threads simultanously.

Packages implemented in C and providing access to high-level operations will often be able to release the GIL when running, but if your processing code is mostly to solely Python and CPU-bound, you you won't get any speedup from multithreading: you'll have threads fighting with one another but your processing will end up being completely sequential due to the GIL.

In that case you need multiprocessing to get speedups, the GIL is per-interpreter so separate interpreters in separate processes don't interfere with one another. The communication / synchronisation cost is even higher though.