In CUDA, what instruction is used to load data from global memory to shared memory?

I am currently studying CUDA and learned that there are global memory and shared memory.

I have checked the CUDA document and found that GPUs can access shared memory and global memory using ld.shared/st.shared and ld.global/st.global instructions, respectively.

What I am curious about is what instruction is used to load data from global memory to shared memory?

It would be great if someone could let me know.

Thanks!

__global__ void my_function(int* global_mem)
{
    __shared__ int shared_mem[10];
    for(int i = 0; i < 10; i++) {
        shared_mem[i] = global_mem[i];  // What instrcuton is used for this load operation?
    }
}

Solution

In the case of

__shared__ float smem[2];
smem[0] = global_memory[0];

Then the operation is (in SASS)

LDG  Rx, [Ry]
STS  [Rz], Rx

To expand a bit more, read https://forums.developer.nvidia.com/t/whats-different-between-ld-and-ldg-load-from-generic-memory-vs-load-from-global-memory/40856/2

Summary:

instruction	meaning
`LDS`	load from shared space
`LDC`	load from constant space
`LDG`	load from global space
`LD`	generic load - space deduced from the supplied address