Error linking with CUDA code: multiple definitions of `__cudaRegisterLinkedBinary_

I have some CUDA code I'm compiling into a .a library, and some (CUDA-related) regular-C++ code for an app which uses it. Everything is undergoing intermediate linking.

Now, on one machine (with CUDA 8.0 RC) the build succeeds, but on another machine (with a Maxwell rather than a Kepler card, in case it matters) I get:

/tmp/tmpxft_00001796_00000000-2_ktkernels_intermediate_link.reg.c:25: multiple definition of `__cudaRegisterLinkedBinary_66_tmpxft_00007a5f_00000000_16_cuda_device_runtime_compute_52_cpp1_ii_8b1a5d37'
CMakeFiles/tester.dir/tester_intermediate_link.o:/tmp/tmpxft_0000180b_00000000-2_tester_intermediate_link.reg.c:4: first defined here
collect2: error: ld returned 1 exit status
CMakeFiles/tester.dir/build.make:1766: recipe for target 'bin/tester' failed
make[2]: *** [bin/tester] Error 1

I actually started removing files from what's compiled into the binary that makes calls to the library code - and only if I remove all of them does the linking succeed.

My questions:

Under what circumstances is it possible for such inconsistent behavior to occur?
Can this possibly be the result of the "second linking" both for the library and for the binary?
What can I do to determine exactly what is actually in conflict (e.g. what symbols to look for in )?
If nothing is actually conflicting, what should I do to avoid this?

Notes:

On one machine I'm using CUDA 7.5, on the other machine it's CUDA 8.0 RC.

Solution

Under what circumstances is it possible for such inconsistent behavior to occur?

If you attempt multiple device linkages within a single application.

Can this possibly be the result of the "second linking" for the library and for the binary?

Almost no doubt.

What can I do to determine exactly what is actually in conflict (e.g. what symbols to look for in )?

The conflict is multiple definitions of the boilerplate which the runtime generated during device link phases and which is used to load device code into a context by the runtime API.

If nothing is actually conflicting, what should I do to avoid this?

The conflicts are real. And avoidance involves properly linking separately complied device code. Beyond that I can't tell you exactly how to fix it because you have chosen not to tell us exactly what you were doing.