cuda gpgpu cuda-uva unified-memory mapped-memory

GPU memory oversubscription with mapped memory, Unified Virtual Addressing and Unified Memory

I'm considering possibilities to process data on a GPU, that is too big for the GPU memory, and I have a few questions.

If I understand that correctly, with mapped memory the data resides in the main memory and is transferred to the GPU only when accessed, so it shouldn't be a problem to allocate more than fits into the GPU memory.

UVA is similar to the mapped memory, but the data can be stored in both the CPU and the GPU memory. But is it possible for the GPU then to access the main memory (as with mapped memory) while being full with its own data? Can a memory overflow happen in this case? I've read that with mapped memory the data goes directly to the local memory without being transferred to the global one first, and in this case there shouldn't be any overflow. Is that true and, if so, is that also true for UVA?

In CUDA 6.0, UM doesn't allow to oversubscribe the GPU memory (and generally doesn't allow to allocate more memory than the GPU has, even in the main memory), but with CUDA 8.0 it becomes possible (https://devblogs.nvidia.com/parallelforall/beyond-gpu-memory-limits-unified-memory-pascal/). Did I get it right?

Solution

Yes, with mapped (i.e. pinned, "zero-copy") method, the data stays in host memory and is transferred to the GPU on-demand, but never becomes resident in GPU memory (unless GPU code stores it there). If you access it multiple times, you may need to transfer it multiple times from the host.

UVA (Unified Virtual Addressing see here) is not the same thing as UM (Unified Memory, see here) or managed memory (== UM), so I shall refer to this case as UM, not UVA.

With UM on a pre-pascal device, UM "managed" allocations will be moved automatically between CPU and GPU subject to some restrictions, but you cannot oversubscribe GPU memory. The maximum amount of all ordinary GPU allocations plus UM allocations cannot exceed GPU physical memory.

With UM plus CUDA 8.0 or later plus a Pascal or newer GPU, you can oversubscribe GPU memory with UM ("managed") allocations. These allocations are then nominally limited to the size of your system memory (minus whatever other demands there are on system memory). In this case, data is moved back and forth automatically between host and device memory, by the CUDA runtime, using a demand-paging method.

UVA is not an actual data management technique in CUDA. It is an underlying technology that enables some features, like certain aspects of mapped memory and generally enables UM features.