Issue:
Im trying to use my graphics card to do some computation using cudafy.net. Ive ran 2 versions of my kernel now and i keep getting an errors at specific intervals ie every 2nd location in the array is 0.0 but should be something much larger. Below is a table of what the GPU returns vs what the correct value is. Note: I've read that comparing floats isnt ideal but getting 0.0 when i should be getting something as large as 6.34419e17 seems wrong.
I GPU Correct Value
16,777,217 0.0 6.34419E17
16,777,219 0.0 6.34419E17
... ... .....
From quickly scanning through them, they seem to be occurring at every 2nd i value.
Checked thus far:
Ive also ran the below code at a different start value as i believed it may be an issue with the data but i still get the same i value for each error.
Ive also changed the order in which the memory is allocated onto the GPU but that doesnt seem to affect the results. Note: since im debugging in VS, im not explicitly clearing the memory on the GPU after i stop. Is this being cleared once i stop debugging? The error is still present once i restart my pc.
Graphics Card:
My graphics card is as follows: EVGA GTX 660 SC.
Code:
My kernel: (Note: i have several variables which arent used below but i havent removed since i wanted to remove 1 thing at a time in order to nail down whats causing this error)
[Cudafy]
public static void WorkerKernelOnGPU(GThread thread, float[] value1, float[] value2, float[] value3, float[] dateTime, float[,] output)
{
float threadIndex = thread.threadIdx.x;
float blockIndex = thread.blockIdx.x;
float threadsPerBlock = thread.blockDim.x;
int tickPosition = (int)(threadIndex + (blockIndex * threadsPerBlock));
//Check to ensure threads dont go out of range.
if (tickPosition < dateTime.Length)
{
output[tickPosition, 0] = dateTime[tickPosition];
output[tickPosition, 1] = -1;
}
}
Below is the segment of code which im using to call the Kernel and then check the results.
CudafyModule km = CudafyTranslator.Cudafy();
_gpu = CudafyHost.GetDevice(eGPUType.Cuda);
_gpu.LoadModule(km);
float[,] Output = new float[SDS.dateTime.Length,2];
float[] pm = new float[]{0.004f};
//Otherwise need to allocate then specify the pointer in the CopyToDevice so it know which pointer to add data to
float[] dev_tpc = _gpu.CopyToDevice(pm);
float[] dev_p = _gpu.CopyToDevice(SDS.p);
float[] dev_s = _gpu.CopyToDevice(SDS.s);
float[,] dev_o = _gpu.CopyToDevice(Output);
float[] dev_dt = _gpu.CopyToDevice(SDS.dateTime);
dim3 grid = new dim3(20000, 1, 1);
dim3 block = new dim3(1024, 1, 1);
Stopwatch sw = new Stopwatch();
sw.Start();
_gpu.Launch(grid, block).WorkerKernelOnGPU(dev_tpc,dev_p, dev_s, dev_dt, dev_o);
_gpu.CopyFromDevice(dev_o, Output);
sw.Stop(); //0.29 seconds
string resultGPU = sw.Elapsed.ToString();
sw.Reset();
//Variables used to record errors.
bool failed = false;
float[,] wrongValues = new float[Output.Length, 3];
int counterError = 0;
//Check the GPU values are as expected. If not record GPU value, Expected value, position.
for (int i = 0; i < 20480000; i++)
{
float gpuValue = Output[i, 0];
if (SDS.dateTime[i] == gpuValue) { }
else
{
failed = true;
wrongValues[counterError, 0] = gpuValue;
wrongValues[counterError, 1] = SDS.dateTime[i];
wrongValues[counterError, 2] = (float)i;
counterError++;
}
}
I only have a single graphics card at my disposal atm so i cant quickly check to see if its an error with the card or not. The card is less then 8 months old and was new when bought.
Any ideas on what could be causing the above error??
Thanks for your time.
Edit: Just tried to reduce my gtx 660 to the stock speeds of a 660. Still experiencing the error though.
Edit2 Ive used _gpu.FreeMemory; to determine if i was exceeding the cards memory. I still have 1,013,202,944 bytes left though.
Edit3 Ive just changed the datatype of the output array to long instead of float. I now seem to have just over 500MB of free space on the card yet i still get the wrong results from the same value ie i = 16,777,217. I guess this seems to suggest it possible something to do with the index thats the issue??
float threadIndex = thread.threadIdx.x;
float blockIndex = thread.blockIdx.x;
float threadsPerBlock = thread.blockDim.x;
int tickPosition = (int)(threadIndex + (blockIndex * threadsPerBlock));
The issue was that fact i was using float for ThreadIndex etc. Once this was changed to int, the issue was resolved.
Time for this fool to get some time away from the pc.