Search code examples
cudanvidiablascublasmagma

Why does the magma_dgemm function not use tensor cores on the V100 GPU?


I run MAGMA testing_dgemm code both on V100 and H100 GPU. With Nsight Systems, I found that on the V100 the code doesn't use tensor cores, but code on the H100 it does.

V100 result:

Nisght Systems profiler screenshot

H100 result:

Nisght Systems profiler screenshot

The tensor core has been used in Volta GPU according to NVIDIA web.

The NVIDIA Inside Volta blog seems not to mention the FP64 TC performances.


Solution

  • The v100 GPU doesn't have a FP64 (double precision) path in its TensorCore unit.

    That path/capability was introduced in Ampere A100 3rd gen TensorCore.

    So when performing FP64 arithmetic, V100 generally will not use TensorCore.

    From here:

    NVIDIA A100 introduces double precision Tensor Cores ...

    (emphasis added)