selectively compile headers and class functions in CUDA

I am attempting to utilize my c++ classes within CUDA.

I have a class as such:

#include<string>
#include<stdlib.h>

class exampleClass{
int i;
__host__ __device__ exampleClass(int _i):i(_i){};
__host__ __device__ void increment(){i++;}
__host__ __device__ string outputMessage(return itoa(i);}

};

I have set this in a .cu file and set to compile CUDA c/c++

This fails to compile with nvcc because cuda doesn't have strings.

What I'd like to do is retain the CUDA only functions by doing something like:

#ifndef __CUDA_ARCH__
  #include<string>
#endif
    #include<stdlib.h>

    class exampleClass{
    int i;
    __host__ __device__ exampleClass(int _i):i(_i){};
    __host__ __device__ void increment(){i++;}
#ifndef __CUDA_ARCH__
     string outputMessage(return itoa(i);}
#endif

    };

But I know this doesn't work...at least, it isn't working for me. The nvcc doesn't like the string inclusion nor, obviously, the function that requires the string type.

Apologies if the example isn't top-notch. In summary, what I'd like to do is have core class members executable on CUDA while maintaining the ability to have fancy host operations for analysis and output on the host side.

UPDATE: My end goal here is to have a base class, containing several pointer types to several polymorphic classes. This base class itself is going to be derivable. I thought this was possible in CUDA5.0. Am I mistaken?

Solution

The following code builds, though I didn't run it:

class exampleClass{
int i;
public:
__host__ __device__ exampleClass(int _i):i(_i){};
__host__ __device__ void increment(){i++;}

 __host__ string outputMessage(){ return "asdf";}


};

__global__ void testkernel (                        
    exampleClass *a,
    int IH, int IW)
{
    const int i = IMUL(blockIdx.x, blockDim.x) + threadIdx.x;
    const int j = IMUL(blockIdx.y, blockDim.y) + threadIdx.y;


    if (i<IW && j<IH) 
    {
        const int i_idx = i + IMUL(j, IW);  
        exampleClass* ptr = a+i_idx;
        ptr->increment();
    }
}

__host__ void test_function(exampleClass *a,
    int IH, int IW)
{
    for (int i = 0; i < IW; i++)
        for (int j = 0; j < IH; j++)
        {
            const int i_idx = i + j*IW;
            exampleClass* ptr = a+i_idx;
            cout << ptr->outputMessage();
        }
}

Note that you'll have to move the classes from device to host memory for this to "work" properly. If you try to do anything fancy with the classes (such as polymorphism, for example) this will probably blow up.