I am currently studying CUDA and learned that there are global memory and shared memory.
I have checked the CUDA document and found that GPUs can access shared memory and global memory using ld.shared
/st.shared
and ld.global
/st.global
instructions, respectively.
What I am curious about is what instruction is used to load data from global memory to shared memory?
It would be great if someone could let me know.
Thanks!
__global__ void my_function(int* global_mem)
{
__shared__ int shared_mem[10];
for(int i = 0; i < 10; i++) {
shared_mem[i] = global_mem[i]; // What instrcuton is used for this load operation?
}
}
In the case of
__shared__ float smem[2];
smem[0] = global_memory[0];
Then the operation is (in SASS)
LDG Rx, [Ry]
STS [Rz], Rx
To expand a bit more, read https://forums.developer.nvidia.com/t/whats-different-between-ld-and-ldg-load-from-generic-memory-vs-load-from-global-memory/40856/2
Summary:
instruction | meaning |
---|---|
LDS |
load from shared space |
LDC |
load from constant space |
LDG |
load from global space |
LD |
generic load - space deduced from the supplied address |