dpc++ start the do loop from 1 to n-2 using parallel_for range

Is that possible to start the do loop and the index is from 1 to n-2 using dpc++ parallel_for?

h.parallel_for(range{lx , ly }, [=](id<2> idx

this will give a do loop from 0 to lx-1, and I have to do

idx[0]>0 && idx[1]>0 && idx[0]<lx-1 && idx[1]<ly-1

and then I can complete the loop?

Also, does dpc++ support like 4D parallel_for?

Solution

In SYCL 1.2.1, parallel_for supports offsets, so you could use h.parallel_for(range{lx-2, ly-2}, id{1, 1}, [=](id<2> idx){ ... });.

However, this overload has been deprecated in SYCL 2020:

Offsets to parallel_for, nd_range, nd_item and item classes have been deprecated. As such, the parallel iteration spaces all begin at (0,0,0) and developers are now required to handle any offset arithmetic themselves. The behavior of nd_item.get_global_linear_id() and nd_item.get_local_linear_id() has been clarified accordingly.

So, if you want to conform to the latest standard, you should apply the offset manually:

h.parallel_for(range{lx-2, ly-2}, [=](id<2> idx0) { id<2> idx = idx0 + 1; ... });

That said, depending on your data layout, your original approach of having "empty" threads might be faster.

Also, does dpc++ support like 4D parallel_for?

No. You will have to use 1D range and compute the 4D index manually.