inactive invocation in subgroups in vulkan

I am reading the vulkan subgroup tutorial and it mentions that if the local workgroup size is less than the subgroup size, then we will always have inactive invocations.

This post clarifies that there is no direct relation between a SubgroupLocalInvocationId and LocalInvocationId. If there is no relation between the subgroup and local workgroup ids, how does the small size of local workgroup guarantee inactive invocations?

My guess is as follows

I am thinking that the invocations (threads) in a workgroup are divided into subgroups before executing on the GPU. Each subgroup would be an exact match for the basic unit of execution on the GPU (warp for an NVIDIA GPU). This means that if the workgroup size is smaller than the subgroup size then the system somehow tries to construct a minimal subgroup which can be executed on the GPU. This would require using some "inactive/dead" invocations just to meet the minimum subgroup size criteria leading to the aforementioned guaranteed inactive invocations. Is this understanding correct? (I deliberately tried to use basic words for simplicity, please let me know if any of the terminology is incorrect)

Thanks

Solution

A dispatch of compute defines with its parameters the global workgroup. The global workgroup has x×y×z invocations.

Each of those invocations are divided into local groups (defined by the shader). A local workgroup also has another set of x×y×z invocations.

A local workgroup is partitioned into subgroups. Its invocations are rearranged into subgroups. A subgroup has (1-dimensional) SubgroupSize amount of invocations, which all need not be assigned a local workgroup invocation. And a subgroup must not span over multiple local workgroups; it can use only invocations from a single local workgroup.

Otherwise how this partitioning is done seems largely unspecified, except that under very specific conditions you are guaranteed full subgroups, which means none of the invocations in a subgroup of SubgroupSize will stay vacant. If those conditions are not fulfilled, then the driver may keep some invocations inactive in the subgroup as it sees fit.

If the local workgroup has in total less invocations than SubgroupSize, then some of the invocations of the subgroup indeed need to stay inactive as there are not enough available local workgroup invocations to fill even one subgroup.