I am attempting to write a program that executes Monte Carlo simulations using OpenCL. I have run into an issue involving exponentials. When the value of the variable steps becomes large, approximately 20000, the calculation of the exponent fails unexpectedly, and the program quits with "Abort Trap: 6". This seems to be a bizarre error given that steps should not affect memory allocation. I have tried setting normal, alpha, and beta to 0 but this does not resolve the problem however commenting out the exponent and replacing it with the constant 1 seems to fix the problem. I have run my code on an AWS GPU instance and it does not run into any issues. Does anybody have any ideas as to why this might be a problem on an integrated graphics card?
Execute the kernel multiple times over a smaller ranges to keep kernel execution time under 5 seconds
Code Snippet
#ifndef M_PI
#define M_PI 3.14159265358979323846
static uint MWC64X(uint2 *state) {
enum { A = 4294883355U };
uint x = (*state).x, c = (*state).y;
uint res = x ^ c;
uint hi = mul_hi(x, A);
x = x * A + c;
c = hi + (x < c);
*state = (uint2)(x, c);
return res;
__kernel void discreteMonteCarloKernel(...) {
float cumulativeWalk = stockPrice;
float currentValue = stockPrice;
uint n = get_global_id(0);
uint2 seed2 = (uint2)(n, seed);
uint random1 = MWC64X(&seed2);
uint2 seed3 = (uint2)(random1, seed);
uint random2 = MWC64X(&seed3);
float alpha = (interestRate - 0.5 * sigma * sigma) * dt;
float beta = sigma * sqrt(dt);
float u1;
float u2;
float a;
float b;
float normal;
for (int j = 0; j < steps; j++) {
random1 = MWC64X(&seed2);
if (random1 == 0) {
random1 = MWC64X(&seed2);
random2 = MWC64X(&seed3);
u1 = (float)random1 / (float)0xffffffff;
u2 = (float)random2 / (float)0xffffffff;
a = sqrt(-2 * log(u1));
b = 2 * M_PI * u2;
normal = a * sin(b);
exponent = exp(alpha + beta * normal);
currentValue = currentValue * exponent;
cumulativeWalk += currentValue;
Problem Report
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Application Specific Information:
abort() called
Application Specific Signatures:
Graphics hardware encountered an error and was reset: 0x00000813
Thread 0 Crashed:: Dispatch queue: opencl_runtime
0 libsystem_kernel.dylib 0x00007fffb14bad42 __pthread_kill + 10
1 libsystem_pthread.dylib 0x00007fffb15a85bf pthread_kill + 90
2 libsystem_c.dylib 0x00007fffb1420420 abort + 129
3 libGPUSupportMercury.dylib 0x00007fffa98e6fbf gpusGenerateCrashLog + 158
4 com.apple.driver.AppleIntelHD5000GraphicsGLDriver 0x000000010915f13b gpusKillClientExt + 9
5 libGPUSupportMercury.dylib 0x00007fffa98e7983 gpusQueueSubmitDataBuffers + 168
6 com.apple.driver.AppleIntelHD5000GraphicsGLDriver 0x00000001091aa031 IntelCLCommandBuffer::getNew(GLDQueueRec*) + 31
7 com.apple.driver.AppleIntelHD5000GraphicsGLDriver 0x00000001091a9f99 intelSubmitCLCommands(GLDQueueRec*, unsigned int) + 65
8 com.apple.driver.AppleIntelHD5000GraphicsGLDriver 0x00000001091b00a1 CHAL_INTEL::ChalContext::ChalFlush() + 83
9 com.apple.driver.AppleIntelHD5000GraphicsGLDriver 0x00000001091aa2c3 gldFinishQueue + 43
10 com.apple.opencl 0x00007fff9ffeeb37 0x7fff9ffed000 + 6967
11 com.apple.opencl 0x00007fff9ffef000 0x7fff9ffed000 + 8192
12 com.apple.opencl 0x00007fffa000ccca 0x7fff9ffed000 + 130250
13 com.apple.opencl 0x00007fffa001029d 0x7fff9ffed000 + 144029
14 libdispatch.dylib 0x00007fffb13568fc _dispatch_client_callout + 8
15 libdispatch.dylib 0x00007fffb1357536 _dispatch_barrier_sync_f_invoke + 83
16 com.apple.opencl 0x00007fffa001011d 0x7fff9ffed000 + 143645
17 com.apple.opencl 0x00007fffa000bda6 0x7fff9ffed000 + 126374
18 com.apple.opencl 0x00007fffa00011df clEnqueueReadBuffer + 813
19 simplisticComparison 0x0000000107b953cf BinomialMultiplication::execute(int) + 1791
20 simplisticComparison 0x0000000107b9ec7f main + 767
21 libdyld.dylib 0x00007fffb138c235 start + 1
Thread 1:
0 libsystem_pthread.dylib 0x00007fffb15a50e4 start_wqthread + 0
1 ??? 0x000070000eed6b30 0 + 123145552751408
Thread 2:
0 libsystem_pthread.dylib 0x00007fffb15a50e4 start_wqthread + 0
Thread 3:
0 libsystem_pthread.dylib 0x00007fffb15a50e4 start_wqthread + 0
1 ??? 0x007865646e496d65 0 + 33888479226719589
Thread 0 crashed with X86 Thread State (64-bit):
rax: 0x0000000000000000 rbx: 0x0000000000000006 rcx: 0x00007fff58074078 rdx: 0x0000000000000000
rdi: 0x0000000000000307 rsi: 0x0000000000000006 rbp: 0x00007fff580740a0 rsp: 0x00007fff58074078
r8: 0x0000000000000000 r9: 0x00007fffb140ba50 r10: 0x0000000008000000 r11: 0x0000000000000206
r12: 0x00007f92de80a7e0 r13: 0x00007f92e0008c00 r14: 0x00007fffba29e3c0 r15: 0x00007f92de801a00
rip: 0x00007fffb14bad42 rfl: 0x0000000000000206 cr2: 0x00007fffba280128
Logical CPU: 0
Error Code: 0x02000148
Trap Number: 133
I have a guess. The driver can crash in two ways:
My money is on #2. If the larger value (steps) makes the GPU run too long, the system will kill things.
I am not familiar with the guts of Apple's Intel driver, but typically there is a way to disable the TDR in extreme cases. E.g. see the Windows Documenation on TDRs to get the gist. (Linux drivers have a way to disable this too.)
Normally we want to avoid running things that take super long and it might be a good idea to decompose the workload in some way so that you naturally don't hit this kill switch. E.g. perhaps chunk the "steps" into smaller chunks (pass in and save your state for parts you can't recompute).