This question is almost the same as How to profile PyCuda code with the Visual Profiler? except about the new NVIDIA Nsight IDE with CUDA 5 for Linux.
I have a PyCUDA Python script that I'd like to profile using fancy Nsight.
I set up a Build External Tools Configuration, pointing to the example script (with executable permissions, included below). I can then run this, and see the printouts in the Console. Then I go to Profile mode and click Run -> Profile---I see the printouts in the Console but no profiler information visible. How do I get the timing plots and occupancy calculators and NVIDIA's suggestions for my code that appear when I run a C/CUDA program in Nsight?
Total IDE noob here (mostly command-line), sorry if my question doesn't include key information. Ubuntu 11.10, PyCUDA 2012.1.
example.py:
#!/usr/bin/env python
import pycuda.autoinit
import pycuda.driver as drv
import numpy
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
const int i = threadIdx.x;
dest[i] = a[i] * b[i];
}
""")
multiply_them = mod.get_function("multiply_them")
a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)
dest = numpy.zeros_like(a)
multiply_them(
drv.Out(dest), drv.In(a), drv.In(b),
block=(400,1,1), grid=(1,1))
print "error:", numpy.sum(numpy.abs(dest - a*b).ravel())
print "Done"
#pycuda.autoinit.context.detach() # seems to break PyCUDA 2012.1
I used nvvp
to get the timeline and the program analysis. Just chmod 755
the script and add a #!/usr/bin/env python
at the top and give it to nvvp.