In the documentation of thread groups, it says "Threads
are organized into threadgroups
that are executed together and can share a common block of memory. While sometimes kernel functions are designed so that threads run independently of each other, it's also common for threads in a threadgroup to collaborate on their working set.
"
For now, I have only worked on some GPGPU programs which have threads working on their own purpose. Could someone give me an example, how threads
in threadgroup
can work together? How could they use the shared memory to collaborate?
The thing with threadgroups is that they work on SIMD like data sets. You execute 1 instruction and it operates on multiple data elements. There are special Metal instructions where threads in a threadgroup can pass results to other threads, but those only exist for MacOSX implementation and not iOS. But there is another kind of "working together" that is possible with a compute shader. If you want to see an example of an advanced Metal based Rice decoder that uses a compute shader to operated on multiple input elements and then emit multiple output elements, take a look at my Rice decoder Rice decoder for Metal. This code basically operates byte by byte and decompresses data, but the reads are done 4 bytes at a time so that SIMD execution results the best possible performance.