I have recently started to learn the metal framework so I can write some filters for my swift app. I am about to write a metal kernel that dithers a picture based on error diffusion dithering. Each pixel is given a Color and then values are distributed to neighbouring pixels based on the original pixels Color. The values are spread out over the whole image as each pixel is calculated so all the pixels are dependent on each other. The example will be a Floyd stein berg dither.
With the way metal deals with threading this dithering method won’t work. When dithering an image the pixels can only be computed in order from first to last. Is it possible to have a kernel that doesn’t involve threading, or a way to select the whole image array to be computed by a single thread?
In short No, you cannot do that with GPU computing as the GPU approach is implicitly parallel. That means one result cannot depend on all the other results. What you could try is breaking down the computation into stages, so that one stage at a time can be done in parallel. It depends on what you computation logic does though. If you only want to use "one thread" on the GPU then it would likely be faster to just do the computations on the CPU instead. If you are interested, I wrote up an approach that you might want to read about Rice decompression with Metal. This approach does block by block segmentation of a decompression task.