In a project I'm using the nvml lib to get info about the GPU in a system. I use it to query the GPU name and GPU UUID. This happens cyclic 6 to 8 time per minute. I noticed a small memory leak which causes a crash of my application after a few hours. The way I am using nvml to query GPU device name is like followed:
nvmlReturn_t result = nvmlInit();
nvmlDevice_t device;
result = nvmlDeviceGetHandleByIndex(deviceNum, &device);
char nameBuffer[NVML_DEVICE_NAME_BUFFER_SIZE];
result = nvmlDeviceGetName(device, nameBuffer, NVML_DEVICE_NAME_BUFFER_SIZE);
result = nvmlShutdown();
But even if I change the code just an init and shutdown of nvml the used memory is still constantly increasing:
nvmlReturn_t result = nvmlInit();
// nvmlDevice_t device;
// result = nvmlDeviceGetHandleByIndex(deviceNum, &device);
// char nameBuffer[NVML_DEVICE_NAME_BUFFER_SIZE];
// result = nvmlDeviceGetName(device, nameBuffer, NVML_DEVICE_NAME_BUFFER_SIZE);
result = nvmlShutdown();
Am I using the API correct or is there something wrong? Is there a known issue in the nvml lib?
Systeminfo:
OS: Windows 10
Nvidia Driver: 536.40
Cuda: 12.2
Here the simple solution to the problem:
I tested with the simple c file from Homer512 from the comments (just init and shutdown nvml in a loop). Over time the test system ran out of memory.
Then I updated the Nvidia driver to latest version (556.12). This seems to fix the issue.