PyTorch Total CUDA time

Autograd profiler is a handy tool to measure the execution time in PyTorch as it is shown in what follows:

import torch
import torchvision.models as models

model = models.densenet121(pretrained=True)
x = torch.randn((1, 3, 224, 224), requires_grad=True)

with torch.autograd.profiler.profile(use_cuda=True) as prof:
    model(x)
print(prof)

The output looks like this:

-----------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------
Name                                        CPU time        CUDA time            Calls        CPU total       CUDA total
-----------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------
conv2d                                    9976.544us       9972.736us                1       9976.544us       9972.736us
convolution                               9958.778us       9958.400us                1       9958.778us       9958.400us
_convolution                              9946.712us       9947.136us                1       9946.712us       9947.136us
contiguous                                   6.692us          6.976us                1          6.692us          6.976us
empty                                       11.927us         12.032us                1         11.927us         12.032us

Which will include many lines. My questions are:

1) How can I use autograd profiler to get the entire CUDA time? (i.e., sum of CUDA time column)

2) Is there any solution to use it pragmatically? For example, prof[0].CUDA_Time?

Solution

[item.cuda_time for item in prof.function_events]

will give you a list of CUDA times. Modify it depending on your needs. To get the sum of CUDA times for example:

sum([item.cuda_time for item in prof.function_events])

Be careful though, the times in the list are in microseconds, while they are displayed in milliseconds in print(prof).