What are some possible causes of a segmentation fault when using the nvcc CUDA compiler?

I have a CUDA class, let's call it A, defined in a header file. I have written a test kernel which creates an instance of class A, which compiles fine and produces the expected result.

In addition, I have my main CUDA kernel, which also compiles fine and produces the expected result. However, when I add code to my main kernel to instantiate an instance of class A, the nvcc compiler fails with a segmentation fault.

Update:

To clarify, the segmentation fault happens during compilation, not when running the kernel. The line I am using to compile is:

`nvcc --cubin -arch compute_20 -code sm_20 -I<My include dir> --keep kernel.cu`

where <My include dir> is the path to my local path containing some utility header files.

My question is, before spending a lot of time isolating a minimal example exhibiting the behaviour (not trivial, due to relatively large code base), has anyone encountered a similar issue? Would it be possible for the nvcc compiler to fail and die if the kernel is either too long or uses too many registers?

If an issue such as register count can affect the compiler this way, then I will need to rethink how to implement my kernel to use fewer resources. This would also mean that trimming things down to a minimal example will likely make the problem disappear. However, if this is not even a possibility, I don't want to waste time on a dead-end, but will rather try to cut things down to a minimal example and will file a bug report to NVIDIA.

Update:

As per the suggestion of @njuffa, I reran the compilation with the -v flag enabled. The output ends with the following:

#$ ptxas  -arch=sm_20 -m64 -v  "/path/to/kernel_ptx/kernel.ptx"  -o "kernel.cubin" 
Segmentation fault
# --error 0x8b --

This suggests the problem is due to the ptxas program, which is failing to generate a CUDA binary from the ptx file.

Solution

This would appear to have been a genuine bug of some sort in the CUDA 5.0 ptxas assembler. It was reported to NVIDIA and we can assume that it was fixed sometime during the more than three years since the question was asked and this answer added.

[This answer has been assembled from comments and added as a community wiki entry to get this question off the unanswered question list ]