Search code examples
pytorchnsight-compute

Nsight Compute can't profile Waveglow (PyTorch application)


I tried to profile https://github.com/NVIDIA/waveglow by this command:

nv-nsight-cu-cli --export ./nsight_output ~/.virtualenvs/waveglow/bin/python3 inference.py -f <(ls mel_spectrograms/*.pt) -w waveglow_256channels.pt -o . --is_fp16 -s 0.6

Python command is from instruction of https://github.com/NVIDIA/waveglow#generate-audio-with-our-pre-existing-model , and it works with Nsight System, not Nsight Compute.

Profiling doesn't end printing this log; so I pressed Ctrl+C. Also, It profiles only one kernel but I have more kernels. (checked by Nsight Systems)

...
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 286: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 287: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 288: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 289: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 290: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 291: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 292: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 293: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 294: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 295: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 296: 0%....50%...^C
==PROF== Received signal, trying to shutdown target application
 - 43 passes
==ERROR== Failed to profile kernel "weight_norm_fwd_first_dim_ker..." in process
==ERROR== An error occurred while trying to profile.
==ERROR== An error occurred while trying to profile
==PROF== Report: nsight_compute_result.nsight-cuprof-report

OS: CentOS Linux 7, Nsight Compute (2019.3.1, Build 26317742), GPU: Tesla V100-PCIE-32GB

How can I fix this?


Solution

  • I don't think there is any error here, the tool behaves as expected. It does not profile only one kernel, it profiled 296 kernel launches already in your log output (which appear to all be from one kernel function).

    You can control the number or types of kernels that are profiled using e.g. the --launch-count or the --kernel-regex options. You can also control the metrics collected for each kernel using --metrics and --section, as collecting fewer metrics reduces the overhead of the tool.

    See https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#command-line-options for more available command line options.