I am working on a testing tool for nvme-cli(written in c and can run on linux).
I am interested in repeating a nvme command 'r' number of times with 't' number of threads.
The below code does the repeat of each command along with threading, but the issue here is the parallel execution time is very much high compared to serial execution.
The reason I found was
err = nvme_identify(fd, 0, 1, data);
in-turn calls system call ioctl();
#pragma omp parallel for num_threads(5)
for(i=0; i<rc; i++){
err = nvme_identify(fd, 0, 1, data);
if (!err) {
if (rf->fmt == BINARY)
d_raw((unsigned char *)&rf->ctrl, sizeof(rf->ctrl));
else if (rf->fmt == JSON)
json_nvme_id_ctrl(data, flags, 0);
else {
printf("NVME Identify Controller:\n");
__show_nvme_id_ctrl(data, flags, 0);
}
}
else if (err > 0)
fprintf(stderr, "NVMe Status:%s(%x)\n",
nvme_status_to_string(err), err);
else
perror("identify controller");
So can I know how to get true parallelism for this either with openmp or pthreads?
You certainly can invoke system calls (listed in syscalls(2)) from different threads (otherwise it won't be possible to write programs which do IO on several threads, like most multi-threaded web servers do).
However, some ioctl
(or other weird system calls) may be blocking or take some long time (several seconds or tenths of seconds) to execute. For a good example, ejecting a CDROM tray is done by some ioctl
and it takes some visible time (perhaps half a second) to perform (because it is some mechanical action).
I guess that NVME being related to SSD technology, some operations are slow (because the hardware itself is slow). Your bottleneck could be the hardware itself, then any kind of parallelisation won't help. It could happen -I don't really know- that you might use the same ioctl
(on the same file descriptor) in several threads but that the kernel would serialize its processing.
When the kernel is doing some blocking or long-running ioctl
(or any other system call) it uses its scheduler to run other tasks (processes or threads) and do some locking. Then your process is in a D
state (see proc(5) and /proc/self/stat
or /proc/1234/stat
for process of pid 1234) and don't even handle signals (see signal(7)) which are postponed.