...are just mentioned in the PTX manual. There is no hint about what they are good for or how to use them.
Does anyone know more? Am I just missing a common concept?
Bart's comment is basically right. In more detail, as stated in the PTX ISA 3.1 manual,
For some instructions the destination operand is optional. A “bit bucket” operand denoted with an underscore (
_
) may be used in place of a destination register.
There is actually only one class of instruction listed in the 3.1 PTX spec for which _
is a valid destination: atom
. Here are the semantics of atom
:
Atomically loads the original value at location a into destination register d, performs a reduction operation with operand b and the value in location a, and stores the result of the specified operation at location a, overwriting the original value.
And there is a note for atom
:
Simple reductions may be specified by using the “bit bucket” destination operand ‘
_
’.
So, we can construct an example:
atom.global.add.s32 _, [a], 4
This would add 4 to the signed integer at memory location a
, and not return the previous value of location a
in a register. So if you don't need the previous value, you can use this. I assume that the compiler would generate this for this code
atomicAdd(&a, 4);
since the return value of atomicAdd is not stored to a variable.