I want to run a DPC++ program in Intel GNU Project Debugger. I have downloaded the Intel GDB from Intel OneAPI Basekit. It comes preinstalled with OneAPI Basekit.
The link to download is :
[https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit.html#gs.ynm6aj]
How to debug the below DPC++ program with the kernel offloaded to the GPU? How to switch between the inferiors and threads? Please review the code provided below.
#include <CL/sycl.hpp>
#include <iostream>
// Location of file: <oneapi-root>/dev-utilities/<version>/include
#include "dpc_common.hpp"
#include "selector.hpp"
using namespace std;
using namespace sycl;
// A device function, called from inside the kernel.
static size_t GetDim(id<1> wi, int dim) {
return wi[dim];
}
int main(int argc, char *argv[]) {
constexpr size_t length = 64;
int input[length];
int output[length];
// Initialize the input
for (int i = 0; i < length; i++)
input[i] = i + 100;
try {
CustomSelector selector(GetDeviceType(argc, argv));
queue q(selector, dpc_common::exception_handler);
cout << "[SYCL] Using device: ["
<< q.get_device().get_info<info::device::name>()
<< "] from ["
<< q.get_device().get_platform().get_info<info::platform::name>()
<< "]\n";
range data_range{length};
buffer buffer_in{input, data_range};
buffer buffer_out{output, data_range};
q.submit([&](auto &h) {
accessor in(buffer_in, h, read_only);
accessor out(buffer_out, h, write_only);
// kernel-start
h.parallel_for(data_range, [=](id<1> index) {
size_t id0 = GetDim(index, 0);
int element = in[index]; // breakpoint-here
int result = element + 50;
if (id0 % 2 == 0) {
result = result + 50; // then-branch
} else {
result = -1; // else-branch
}
out[index] = result;
});
// kernel-end
});
q.wait_and_throw();
} catch (sycl::exception const& e) {
cout << "fail; synchronous exception occurred: " << e.what() << "\n";
return -1;
}
// Verify the output
for (int i = 0; i < length; i++) {
int result = (i % 2 == 0) ? (input[i] + 100) : -1;
if (output[i] != result) {
cout << "fail; element " << i << " is " << output[i] << "\n";
return -1;
}
}
cout << "success; result is correct.\n";
return 0;
}
The threads of the application can be listed using the debugger. The printed information includes the thread ids and the locations that the threads are currently stopped at. For the GPU threads, the debugger also prints the active SIMD lanes. GDB is displays the threads with the following format:
<inferior_number>.<thread_number>:<SIMD Lane/s>
You can switch the thread as well as the SIMD lane to change the context using the "thread" command such as "thread 3:4 ", "thread :6 ", or "thread 7 ". The first command makes a switch to the thread 3 and SIMD lane 4. The second command switches to SIMD lane 6 within the current thread. The third command switches to thread 7. The default lane selected will either be the previously selected lane, if it is active, or the first active lane within the thread. For more details, please refer the below link which explains how to debug a code on GPU device.
[https://software.intel.com/content/www/us/en/develop/documentation/debugging-dpcpp-linux/top/debug-a-dpc-application-on-a-gpu/basic-debugging-1.html]