An exception occurs when I try to find the 100,000th prime number using Alea GPU. The algorithm works fine if I try to find a smaller prime number e.g. the 10,000th prime number.
I am using Alea v3.0.4, NVIDIA GTX 970, Cuda 9.2 drivers.
I am new to GPU programming. Any help would be greatly appreciated.
long[] primeNumber = new long[1]; // nth prime number to find
int n = 100000; // find the 100,000th prime number
var worker = Gpu.Default; // GTX 970 CUDA v9.2 drivers
long count = 0;
worker.LongFor(count, n, x =>
long a = 2;
while (count < n)
long b = 2;
long prime = 1;
while (b * b <= a)
if (a % b == 0)
prime = 0;
if (prime > 0)
primeNumber[0] = (a - 1);
Here are the exception details:
System.Exception occurred HResult=0x80131500 Message=[CUDAError] CUDA_ERROR_LAUNCH_FAILED Source=Alea StackTrace: at Alea.CUDAInterop.cuSafeCall@2939.Invoke(String message) at Alea.CUDAInterop.cuSafeCall(cudaError_enum result) at A.cf5aded17df9f7cc4c132234dda010fa7.Copy@918-22.Invoke(Unit _arg9)
at Alea.Memory.Copy(FSharpOption1 streamOpt, Memory src, IntPtr srcOffset, Memory dst, IntPtr dstOffset, FSharpOption
1 lengthOpt)
at Alea.ImplicitMemoryTrackerEntry.cdd2cd00c052408bcdbf03958f14266ca(FSharpFunc2 c600c458623dca7db199a0e417603dff4, Object cd5116337150ebaa6de788dacd82516fa) at Alea.ImplicitMemoryTrackerEntry.c6a75c171c9cccafb084beba315394985(FSharpFunc
2 c600c458623dca7db199a0e417603dff4, Object cd5116337150ebaa6de788dacd82516fa) at Alea.ImplicitMemoryTracker.HostReadWriteBarrier(Object instance) at Alea.GlobalImplicitMemoryTracker.HostReadWriteBarrier(Object instance) at A.cf5aded17df9f7cc4c132234dda010fa7.clo@2359-624.Invoke(Object arg00) at Microsoft.FSharp.Collections.SeqModule.Iterate[T](FSharpFunc2 action, IEnumerable
1 source) at Alea.Kernel.LaunchRaw(LaunchParam lp, FSharpOption1 instanceOpt, FSharpList
1 args) at Alea.Parallel.Device.DeviceFor.For(Gpu gpu, Int64 fromInclusive, Int64 toExclusive, Action1 op) at Alea.Parallel.GpuExtension.LongFor(Gpu gpu, Int64 fromInclusive, Int64 toExclusive, Action
1 op) at TestingGPU.Program.Execute(Int32 t) in C:\Users..\source\repos\TestingGPU\TestingGPU\Program.cs:line 148
at TestingGPU.Program.Main(String[] args)
Working Solution:
static void Main(string[] args)
var devices = Device.Devices;
foreach (var device in devices)
while (true)
Console.WriteLine("Enter a number to check if it is a prime number:");
string line = Console.ReadLine();
long checkIfPrime = Convert.ToInt64(line);
Stopwatch sw = new Stopwatch();
bool GPUisPrime = GPUIsItPrime(checkIfPrime+1);
Stopwatch sw2 = new Stopwatch();
bool CPUisPrime = CPUIsItPrime(checkIfPrime+1);
Console.WriteLine($"GPU: is {checkIfPrime} prime? {GPUisPrime} Time Elapsed: {sw.ElapsedMilliseconds.ToString()}");
Console.WriteLine($"CPU: is {checkIfPrime} prime? {CPUisPrime} Time Elapsed: {sw2.ElapsedMilliseconds.ToString()}");
private static bool GPUIsItPrime(long n)
//Sieve of Eratosthenes Algorithm
bool[] isComposite = new bool[n];
var worker = Gpu.Default;
worker.LongFor(2, n, i =>
if (!(isComposite[i]))
for (long j = 2; (j * i) < isComposite.Length; j++)
isComposite[j * i] = true;
return !isComposite[n-1];
private static bool CPUIsItPrime(long n)
//Sieve of Eratosthenes Algorithm
bool[] isComposite = new bool[n];
for (int i = 2; i < n; i++)
if (!isComposite[i])
for (long j = 2; (j * i) < n; j++)
isComposite[j * i] = true;
return !isComposite[n-1];
Your code doesn't look right. Given a parallel for-loop method here (LongFor), Alea will spawn "n" threads, with an index "x" used to identify what the thread number is. So, for example a simple example like For(0, n, x => a[x] = x); uses "x" to initialize a[] with { 0, 1, 2, ...., n - 1}. But, your kernel code does not use "x" anywhere in the code. Consequently, you run the same code "n" times with absolutely no difference. Why then run on a GPU? What I think you want is to do is to compute in thread "x" whether "x" is prime. With result in hand, set bool prime[x] = true or false. Then, afterwards, in the kernel after all that, add a sync call, followed with a test using a single thread (e.g., x == 0) to go through prime[] and pick the largest prime from the array. Otherwise, there's a lot of collisions for 'primeNumber[0] = (a - 1);' by n-threads on the GPU. I can't imagine how you would ever get the right result. Finally, you probably want to make sure using some Alea call that prime[] is never copied to/from the GPU. But, I don't know how you do that in Alea. The compiler might be smart enough to know that prime[] is only used in the kernel code.