I am new to oneAPI and similar frameworks, so I am having trouble with data management using SYCL data buffers.
My task is to find substrings in a given string using Aho-Corasick algorithm.
My Idea was to build a trie and after that submit a kernel that would parallelly find substrings in the trie. So for that I created a SYCL queue, created buffers for string (the one to find substrings in), for vector (to store the result of the search) and for my Aho-Corasick object, which contains the root of the previously built trie. However, about the last one I'm not sure, since I am creating a buffer for an object in host memory, that contains pointers to other objects (such as Nodes, that contain pointers to other Nodes).
The structure of Node object:
class Node {
typedef Node *node_ptr;
private:
std::set<std::pair<int, std::string>> retVals;
std::unordered_map<char, node_ptr> children;
node_ptr fail;
char value;
This is the searching method:
void
matchWords(char *text, int startIdx, int endIdx, cl::sycl::cl_int *matched) {
node_ptr child = start;
int item = startIdx;
for (int i = startIdx; i < endIdx; ++i) {
child = child->nextNode(text[i]);
if (child == nullptr) {
child = start;
continue;
}
for (const auto &returns: child->getRetVals()) {
matched[item++] = returns.first;
if (item == endIdx) item = startIdx;
}
}
}
Buffers:
cl::sycl::buffer<char, 1> fasta_buf(tempFasta.data(), cl::sycl::range<1>(len));
cl::sycl::buffer<cl::sycl::cl_int, 1> vec_buf(vec.data(), cl::sycl::range<1>(len));
cl::sycl::buffer<aho_corasick::AhoCorasick, 1> aho_buf(a, cl::sycl::range<1>(1));
and queue sumbition:
q.submit([&](cl::sycl::handler &cgh) {
auto string_acc = fasta_buf.get_access<cl::sycl::access::mode::read>(cgh);
auto vec_acc = vec_buf.get_access<cl::sycl::access::mode::read_write>(cgh);
auto aho_acc = aho_buf.get_access<cl::sycl::access::mode::read>(cgh);
cgh.parallel_for<class dummy>(
cl::sycl::range<1>(10), [=](cl::sycl::item<1> i) {
// 10 is the number of workers I want
int startInx = (int) (i.get_linear_id() * (len / 10));
int endInx = (int) ((i.get_linear_id() + 1) * (len / 10));
aho_acc.get_pointer()->matchWords(string_acc.get_pointer(), startInx, endInx, vec_acc.get_pointer());
});
});
q.wait_and_throw();
I figured out that the program fails after trying to access the children map's items. Thus, I think the problem is that the pointers stored in map are pointers to host memory, which the device doesn't have access to.
if I understood correctly, you are attempting to use std::unordered_map
, std::string
and std::set
in device code. I'm not an expert on Intel-specific oneAPI SYCL extensions, but in pure SYCL 1.2.1 this is not allowed and I would be surprised if this works in DPC++.
The SYCL 1.2.1 spec does not really define how SYCL interacts with the standard library. While some implementations may be able to make some guarantees about certain well-defined portions of the standard library working in devie code as an extension (commonly e.g. std::
math functions), this is not universally guaranteed across SYCL implementations.
Additionally supporting STL containers in device code (which is not required by the SYCL spec) I would imagine to be particularly difficult and I've never heard of a SYCL implementation supporting that. This is because containers typically employ mechanisms unsupported in SYCL device code because they require runtime support, for example throwing exceptions. Because on, say, a GPU there's no C++ runtime, such mechanisms cannot work in SYCL.
It is also important to understand that this is not really a SYCL-specific limitation, but a common restriction among heterogeneous programming models. Other heterogeneous programming models such as CUDA impose similar restrictions for similar reasons.
Another difficulty with containers in kernels is that STL data structures are usually not really designed for the massively parallel SIMT execution model on a SYCL device, making them prone to race conditions.
The final probem is the one you have already identified: You are copying pointers to host memory. Since you are on oneAPI DPC++, the easiest solution to work with pointer-based data structures is to use the Intel SYCL extension of unified shared memory (USM) which can be used to generate pointers that are valid both on host and device. There is also a USM allocator that could be passed to containers if they were supported in device code.