Search code examples
multithreadingopenglsemaphoregpuimage

Why does GPUImage use semaphores and processing queues instead of a thread with a runloop?


As far as I understand GPUImage performs DAG traversal and uses semaphores to protect OpenGL usage while treating it as a single use resource together with the framebuffer texture cache.

Is there a reason to use semaphores here? Do they not unecessarily complicate the situation? What benefit do they provide and what kind of problems will be encountered by using a separate thread implementation for each filter DAG instead of running on a separate thread in a runloop. Were there particular design considerations which informed the decision for the current GPUImage architecture?


Solution

  • When working with a OpenGL(ES) context, bad things happen if you access it from more than one thread at a time. You could simply perform all of your rendering and interaction code on the main thread, but that would interfere with your UI and would halt any image or video processing during UI events (like pulling down a menu). There also are significant performance advantages to doing OpenGL(ES) rendering on a background thread.

    Therefore, you need a way to perform OpenGL(ES) rendering on a background thread while still protecting against simultaneous access. Manually created threads and locks would be one way to do this, but locks have significant performance overhead and properly managing manually created threads can add a lot of code (and the potential to waste resources).

    A one-block-at-a-time Grand Central Dispatch queue is an efficient and relatively easy way to provide safe, lock-free access to a shared resource like this. Any place you want to do OpenGL(ES) rendering on your context, simply wrap it in a block to be dispatched on the context's serial dispatch queue. That makes it easy to see where these accesses take place in your code, and spares you from the performance and code overhead of maintaining manual threads, runloops, and locks.

    I discuss the reason why I use dispatch semaphores in my answer here, but this is a way of selectively dropping incoming frames in response to load.

    With a serial dispatch queue like this, I want to make sure that at any given time I only have a single image or video frame working its way through the queue. With a single GPU, there is no advantage to rendering more than one image at a time.

    However, if you've got a camera providing frames to be processed at 30-60 frames per second, and your processing pipeline occasionally takes more than 1/30th or 1/60th of a second to operate on these images, you have to make a decision. Do you drop the incoming frames, or do you enqueue them to be processed? If it's the latter, you'll keep building up more and more frames in your queue until you exhaust available processing and memory resources, and you'll also see a larger and larger lag in processing.

    The dispatch semaphore allows me to immediately drop frames if there is one already being processed in the serial dispatch queue, and to do so in a performant and safe manner. It also only adds a few lines of code, almost all of which are found in my answer here (this is even shorter and more readable in Swift 3).

    The architecture I describe above has been thoroughly profiled, and has been the best solution I've found for these needs. I've used it for years to provide 60 FPS OpenGL ES rendering of molecular models on older iOS hardware, live machine vision processing on Macs, and realtime video filtering on iOS. It's proven to be pretty solid and easy to maintain, given all the things that can go wrong with multithreaded code. The overhead from the GCD queues and semaphores has never been close to a performance bottleneck in my video rendering.