Search code examples
gpgpuopenacc

How to measure precisely the memory usage of the GPU (OpenACC+Managed Memory)


Which is the most precise method to measure the memory usage of the GPU of an application that is using OpenACC with Managed Memory? I used two method to do so: one is

nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.02    Driver Version: 510.85.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla v100     ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   51C    P5    11W /  N/A | 10322MiB /  16160MiB |     65%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2670      G   ./myapp                           398MiB |
+-----------------------------------------------------------------------------+

About what here is printed, which is the difference between the Memory usage above (10322MiB / 16160MiB) and that below (./myapp 398MiB) ?

The other method I used is:

void measure_acc_mem_usage() {
    auto dev_ty = acc_get_device_type();
    auto dev_mem = acc_get_property(0, dev_ty, acc_property_memory);
    auto dev_free_mem = acc_get_property(0, dev_ty, acc_property_free_memory);
    auto mem = dev_mem - dev_free_mem;
    if (mem > max_mem_usage) 
        max_mem_usage = mem;
}

A function I call many times during the program execution.

Both these methods don't seem to report the exact behaviour of the device (basing this statement on when the saturation seems to occurs: when the application begins to run really slow increasing the problem size) and report very different values (while for example, the second method indicates 2GB of memory usage, nvidia-smi says 16GB)


Solution

  • Not sure you'll be able to get a precise value of memory usage when using CUDA Unified Memory (aka managed). The nvidia-smi utility will only show cudaMalloc allocated memory and the OpenACC property function will use cudaGetMemInfo which isn't accurate for UM.

    Bob gives a good explanation as to why here: CUDA unified memory pages accessed in CPU but not evicted from GPU