How to split class definition between multiple .cpp and .cu files?

I've got a class with nested classes mixing both C++, CUDA and Thrust. I want to split member definitions across a number of files.

// In cls.h:
#include <thrust/device_vector.h>
class cls {
    class foo {   // define in foo.cu    (include "cls.h")
        kernelWrapper();
    }
    class bar {   // define in bar.cu    (include "cls.h")
        thrust::device_vector A;
        thrustStuff();
    }
    thrust::device_vector B;
    pureCPP();      // define in cls.cpp (include "cls.h")
    moreThrust();   // define in cls.cu  (include "cls.h")
}

In each definition file I simply #include "cls.h". However, I am currently getting an assortment of compiler errors no matter what I try, like pureCPP was referenced but not defined.

I've read Thrust can only be used with .cu files. Because my parent class cls declares Thrust-type variables like B (and hence #includes thrust/device_vector.h), does that force all files that #include cls.h to be made into .cu files?
Where do I use extern "C" in this case? I suppose cls.cpp would require all functions in .cu files to be wrapped in extern "C", but what about .cu to .cu calls, like moreThrust() calling bar::thrustStuff()
I've also been made aware members of classes don't work with extern "C", so do I have to write an extern "C" wrapper function for each member function?

I'm utterly confused as to how to make this all work - what cocktail of #includes and extern "C"s do I need for each file?

Solution

Taking your small example, this compiled and ran fine for me

/*
Inside File cls.h
*/
#pragma once
#include <thrust/device_vector.h>
#include <stdio.h>

class cls {

public:
    class foo {   // define in foo.cu    (include "cls.h")
    public:
        void kernelWrapper();
    };

    class bar {   // define in bar.cu    (include "cls.h")
        thrust::device_vector<int> A;
    public:
        void thrustStuff();
    };

public:
    void pureCPP();      // define in cls.cpp (include "cls.h")
    void moreThrust();   // define in cls.cu  (include "cls.h")

private:
    thrust::device_vector<int> B;
};

/*
Inside File foo.cu
*/
#include "cls.h"
void cls::foo::kernelWrapper()
{
    printf("kernelWrapper\n");
}

/*
Inside File bar.cu
*/
#include "cls.h"
void cls::bar::thrustStuff()
{
    printf("Thrust Stuff\n");
}

/*
Inside File cls.cpp
*/
#include "cls.h"
void cls::pureCPP()
{
    printf("pureCPP\n");
}

/*
Inside File cls.cu
*/
#include "cls.h"
void cls::moreThrust()
{
    printf("moreThrust\n");
}


/*
Inside File main.cpp
*/
#include "cls.h"
int main()
{
    cls a_class;
    a_class.pureCPP();
    a_class.moreThrust();

    cls::bar a_class_bar;
    a_class_bar.thrustStuff();

    cls::foo a_class_foo;
    a_class_foo.kernelWrapper();
}

Running this prints

pureCPP

moreThrust

Thrust Stuff

KernelWrapper

If anything, I'd bet that you're using an IDE and it's not compilling all of your files, so while you have your class member declaration in your header file, but it'll never find the corresponding definition. Your exact compilation commands will be different, but for me (on Linux) I used

nvcc -G -g -O0 -gencode arch=compute_20,code=sm_21 -odir "src" -M -o "src/bar.d" "../src/bar.cu"
nvcc --device-c -G -O0 -g -gencode arch=compute_20,code=sm_21  -x cu -o  "src/bar.o" "../src/bar.cu"

nvcc -G -g -O0 -gencode arch=compute_20,code=sm_21 -odir "src" -M -o "src/cls.d" "../src/cls.cu"
nvcc --device-c -G -O0 -g -gencode arch=compute_20,code=sm_21  -x cu -o  "src/cls.o" "../src/cls.cu"

nvcc -G -g -O0 -gencode arch=compute_20,code=sm_21 -odir "src" -M -o "src/foo.d" "../src/foo.cu"
nvcc --device-c -G -O0 -g -gencode arch=compute_20,code=sm_21  -x cu -o  "src/foo.o" "../src/foo.cu"

nvcc -G -g -O0 -gencode arch=compute_20,code=sm_21 -odir "src" -M -o "src/main.d" "../src/main.cpp"
nvcc -G -g -O0 --compile  -x c++ -o  "src/main.o" "../src/main.cpp"

nvcc -G -g -O0 -gencode arch=compute_20,code=sm_21 -odir "src" -M -o "src/clscpp.d" "../src/cls.cpp"
nvcc -G -g -O0 --compile  -x c++ -o  "src/clscpp.o" "../src/cls.cpp"

nvcc --relocatable-device-code=true -gencode arch=compute_20,code=sm_21 -link -o  "split_compilation"  ./src/bar.o ./src/cls.o ./src/foo.o ./src/clscpp.o ./src/main.o

The idea is just to compile all of your source files and link them together. For example, if I didn't compile and link the cls.cpp file, I'd get a linker error on any calls to pureCPP.

Also, note that if you're using actual device code, you'll have to have specify __device__ and/or __host__ for your member functions. See this other SO question