Consider the following program (written in C syntax):
#include <cuda.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
CUresult result;
unsigned int init_flags = 0;
result = cuInit(init_flags);
if (result != CUDA_SUCCESS) { exit(EXIT_FAILURE); }
CUcontext ctx;
unsigned int ctx_create_flags = 0;
CUdevice device_id = 0;
result = cuCtxCreate(&ctx, ctx_create_flags, device_id);
// Note: The created context is also made the current context,
// so we are _in_ a context from now on.
if (result != CUDA_SUCCESS) { exit(EXIT_FAILURE); }
CUdeviceptr requested = 0;
CUdeviceptr reserved;
size_t size = 0x20000;
size_t alignment = 0; // default
unsigned long long reserve_flags = 0;
// -----------------------------------
// ==>> FAILURE on next statement <<==
// -----------------------------------
result = cuMemAddressReserve(&reserved, size, alignment, requested, reserve_flags);
if (result != CUDA_SUCCESS) {
const char* error_string;
cuGetErrorString(result, &error_string);
fprintf(stderr, "cuMemAddressReserve() failed: %s\n", error_string);
exit(EXIT_FAILURE);
}
return 0;
}
This fails when trying to make the reservation:
cuMemAddressReserve() failed: invalid argument
what's wrong with my arguments? Is it the size? the alignment? Requesting an address of 0? If it's the latter - how can I even know what address to request, when I don't really care?
tl;dr: Your reserved region size is not a multiple of (some device's) allocation granularity.
As @AbatorAbetor suggested, cuMemAddressReserve()
implicitly requires the size of the memory region to be a multiple of some granularity value. And despite 0x20000 seeming like a generous enough value for that (2^21 bytes ... system memory pages are typically 4 KiB = 2^12 bytes) - NVIDIA GPUs are very demanding here.
For example, a Pascal GTX 1050 Ti GPU with ~4GB of memory has a granularity of 0x200000, or 2 MiB - 16 times more than what you were trying to allocate.
Now, what would happen if we had two devices with different granularity values? Would we need to use the least-common-multiple? Who knows.
Anyway, bottom line: Always check the granularity both before allocating and before reserving.
I have filed this as a documentation bug with NVIDIA, bug 3486420 (but you may not be able to follow the link, because NVIDIA hide their bugs from their users).