Kernel failed error while using CUDA 5.5 on Mac OS X

I am writing a CUDA Raytracer and seem to be stuck at a weird problem. I am using CUDA 5.5 along with GCC4.2.1 on my Mac OS X and am using GLM 0.9.4.4. Whenever I call my raycastFromCameraKernel function, I get this error:

Cuda error: Kernel failed!: OS call failed or operation not supported on this OS.

After some debugging, I think I have narrowed down the problem to the glm::normalize(temp) function. If I substitute this by writing my own normalize function, the code works fine. Interestingly, when I wrote a sample program using glm::normalize just to see if it was working, it compiled and ran properly!

Here is the code to the function having the issue:

__host__ __device__ ray raycastFromCameraKernel(glm::vec2 resolution, float time, int x, int y, glm::vec3 eye, glm::vec3 view, glm::vec3 up, glm::vec2 fov)
{
glm::vec3 eyePoint = eye;
glm::vec3 V = up;
glm::vec3 W = view;
glm::vec3 U = glm::cross(V,W); // Perter Sherley page 74 (Creating orthonormal vectors)

float fovY = fov.y;

//d is the near clip plane
float distance = (resolution.y / 2.0f) / tan(fovY);

float left = -resolution.x/2;
float right = resolution.x/2;
float top = resolution.y/2;
float bottom = -resolution.y/2;

float u = left + (right - left)*(x + 0.5)/resolution.x;
float v = bottom + (top - bottom)*(y + 0.5)/resolution.y;

ray r;
r.origin = eyePoint;
glm::vec3 temp = -1*distance*W + u*U + v*V;
r.direction = glm::normalize(temp);
return r;
}

Could someone please help?

Solution

So the problem was that I was having a divide by zero error caused due to very less values (near to zero) in temp for particular values of distance, u and V and this was causing the divide by zero error in glm::normalize. I solved this by checking for the values of temp before normalizing it and only normalized temp if it was above a given threshold. That solved the problem.