EXC_BAD_ACCESS (code=EXC_I386_GPFLT) when indexing accessor

I'd like to run a parallel for loop to initialize a 2 dimensional buffer with random values. But I get an EXC_BAD_ACCESS (code=EXC_I386_GPFLT) exception in the first line of the kernel.

This is the Code (the Pixel struct is from the PixelGameEngine Library):

namespace olc
{
    struct Pixel
    {
        union
        {
            uint32_t n = nDefaultPixel;
            struct
            {
                uint8_t r;
                uint8_t g;
                uint8_t b;
                uint8_t a;
            };
        };

        enum Mode
        {
            NORMAL, MASK, ALPHA, CUSTOM
        };

        Pixel()
        {
            r = 0;
            g = 0;
            b = 0;
            a = nDefaultAlpha;
        }

        Pixel(uint8_t red, uint8_t green, uint8_t blue, uint8_t alpha = nDefaultAlpha)
        {
            n = red | (green << 8) | (blue << 16) | (alpha << 24);
        }

        Pixel(uint32_t p) { n = p; }
    };
}

 int main()
{
    auto* screenBuffer = new olc::Pixel[256 * 240];

    cl::sycl::queue queue;
    {
        cl::sycl::buffer<olc::Pixel, 2> b_screenBuffer(screenBuffer,
                                                       cl::sycl::range<2>(256, 240));

        queue.submit([&](cl::sycl::handler& cgh) {
            auto screenBuffer_acc = b_screenBuffer.get_access<cl::sycl::access::mode::write>(cgh);

            cgh.parallel_for(cl::sycl::range<2>(256, 240), [&](cl::sycl::id<2> index) {
                screenBuffer_acc[index] = olc::Pixel(rand() % 256, //Here I get the exception 
                                                     rand() % 256,
                                                     rand() % 256);
            });
        });
    }

    return 0;
}

I'm using triSYCL 0.1.0 on Mac OS Catalina with boost 1.74 and C++20. I've tried to change the data type of the buffer from old::Pixel to float and to make the buffer 1 dimensional but I always got the same exception. Does anybody have an idea what I could try?

Solution

Your code is fine apart from one small issue:

cgh.parallel_for(cl::sycl::range<2>(256, 240), [&](cl::sycl::id<2> index) {...}

Here you are capturing variables by reference in your kernel. Don't ever do this. If you are targeting an accelerator (e.g. GPU), the captured variables will reference host memory and crash when accessed on device. If you only target CPU, it's still undefined behavior since the kernel is executed asynchronously and the captured variables might not exist anymore when the kernel is run.

Therefore, always capture by value in kernel lambdas:

cgh.parallel_for(cl::sycl::range<2>(256, 240), [=](cl::sycl::id<2> index) {...}

In the lambda for command group scope queue.submit([&](sycl::handler& cgh){...}) it is always fine to capture by reference since the command group scope is always evaluated on the host synchronously.

I don't have a triSYCL install at hand, but after this small modification your code ran fine with hipSYCL (which also runs on Mac by the way, in case you'd like to double-check your code with two implementations which can be helpful sometimes). If changing the lambda capture doesn't solve your problem you might want to open an issue on triSYCL's github repository.