How to profile PyCUDA code with NVIDIA Nsight in Linux?

This question is almost the same as How to profile PyCuda code with the Visual Profiler? except about the new NVIDIA Nsight IDE with CUDA 5 for Linux.

I have a PyCUDA Python script that I'd like to profile using fancy Nsight.

I set up a Build External Tools Configuration, pointing to the example script (with executable permissions, included below). I can then run this, and see the printouts in the Console. Then I go to Profile mode and click Run -> Profile---I see the printouts in the Console but no profiler information visible. How do I get the timing plots and occupancy calculators and NVIDIA's suggestions for my code that appear when I run a C/CUDA program in Nsight?

Total IDE noob here (mostly command-line), sorry if my question doesn't include key information. Ubuntu 11.10, PyCUDA 2012.1.

Nsight screenshot

example.py:

#!/usr/bin/env python
import pycuda.autoinit
import pycuda.driver as drv
import numpy

from pycuda.compiler import SourceModule

mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
  const int i = threadIdx.x;
  dest[i] = a[i] * b[i];
}
""")

multiply_them = mod.get_function("multiply_them")

a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)

dest = numpy.zeros_like(a)
multiply_them(
        drv.Out(dest), drv.In(a), drv.In(b),
        block=(400,1,1), grid=(1,1))

print "error:", numpy.sum(numpy.abs(dest - a*b).ravel())
print "Done"
#pycuda.autoinit.context.detach() # seems to break PyCUDA 2012.1

Solution

I used nvvp to get the timeline and the program analysis. Just chmod 755 the script and add a #!/usr/bin/env python at the top and give it to nvvp.