Is that possible to start the do loop and the index is from 1 to n-2 using dpc++ parallel_for?
h.parallel_for(range{lx , ly }, [=](id<2> idx
this will give a do loop from 0 to lx-1, and I have to do
idx[0]>0 && idx[1]>0 && idx[0]<lx-1 && idx[1]<ly-1
and then I can complete the loop?
Also, does dpc++ support like 4D parallel_for?
In SYCL 1.2.1, parallel_for
supports offsets, so you could use h.parallel_for(range{lx-2, ly-2}, id{1, 1}, [=](id<2> idx){ ... });
.
However, this overload has been deprecated in SYCL 2020:
Offsets to
parallel_for
,nd_range
,nd_item
anditem
classes have been deprecated. As such, the parallel iteration spaces all begin at(0,0,0)
and developers are now required to handle any offset arithmetic themselves. The behavior ofnd_item.get_global_linear_id()
andnd_item.get_local_linear_id()
has been clarified accordingly.
So, if you want to conform to the latest standard, you should apply the offset manually:
h.parallel_for(range{lx-2, ly-2}, [=](id<2> idx0) { id<2> idx = idx0 + 1; ... });
That said, depending on your data layout, your original approach of having "empty" threads might be faster.
Also, does dpc++ support like 4D parallel_for?
No. You will have to use 1D range and compute the 4D index manually.