I need to read many different files in succession as fast as possible. It's not one big file, but many small ones. The files I try to read from are the stat files in /proc/<pid>/stat
I am using std::ifstream
and std::getline()
to read the files.
Here is my current code:
std::ifstream statFile("/proc/" + pid + "/stat");
if (!statFile.is_open())
{
std::cerr << "Error: Could not open file for PID " << pid << std::endl;
return 0; // No fatal error because file may be deleted during read
}
std::string line;
if (!std::getline(statFile, line))
{
std::cerr << "Error: Could not read from file for PID " << pid << std::endl;
return 0; // No fatal error because file may be deleted during read
}
I tried using mmap()
, but that doesn't seem to work in the /proc/
directory.
I also tried using a buffer, but that was slower.
I recommend opening the files as normal with std::ifstream
and then use std::ifstream::read
to read the whole file into a fixed size char
array. On my system an array of 957 is enough given the max (or min) values of all the fields in proc_pid_stat(5)
+ a max length of the comm
string of 16. I'd round it up to 1024 for good measure. If your system has sizeof(int)
greater than 4, double the size of the buffer - or double it anyway. I doubt you'll notice a difference.
For extracting the numerical values, I recommend using std::from_chars
which is supposed to provide the fastest way to convert char
arrays into numerical types.
I'd start by defining a class that can hold the values:
struct proc_pid_stat {
/*
(1) pid %d
The process ID.
*/
int pid;
/*
(2) comm %s
The filename of the executable, in parentheses.
Strings longer than TASK_COMM_LEN (16) characters
(including the terminating null byte) are silently
truncated. This is visible whether or not the
executable is swapped out.
*/
std::string comm;
//... add all the fields with the correct types ...
/*
(52) exit_code %d (since Linux 3.5) [PT]
The thread's exit status in the form reported by
waitpid(2).
*/
int exit_code;
};
To this class I'd add a "magic" value that can be used to indicate if extracting the information from the file failed. This will be set on the last field in the class when extraction starts, but it'll will be overwritten if extraction succeeds.
struct proc_pid_stat {
// same as above goes here
static constexpr int fail = std::numeric_limits<int>::min();
};
Then to the actual extration. The only messy parts are the comm
(2) and state
(3) fields, which comes early. The rest can be made into a big fold expression in which std::from_chars
is used:
struct proc_pid_stat {
// same as above goes here
friend std::istream& operator>>(std::istream& is, proc_pid_stat& pps) {
pps.exit_code = fail; // set the last field to a "fail" value
char buf[1024]; // max length with all the fields incl. comm is 957
// read the whole line:
is.read(buf, static_cast<std::streamsize>(sizeof buf));
const char* const end = buf + is.gcount();
// extract fields:
auto rptr = std::from_chars(buf, end, pps.pid).ptr; // (1)
if(rptr == end) return is;
++rptr;
if(std::distance(rptr, end) < kernel_thread_comm_len) return is;
std::string_view comm(rptr, kernel_thread_comm_len);
const auto cpos = comm.rfind(')');
if(cpos == std::string_view::npos) return is;
auto sp = rptr + cpos + 1;
if(std::distance(sp, end) < 96) return is; // a resonable amount left
pps.comm.assign(rptr, sp); // (2)
pps.state = *++sp; // (3)
++sp;
// if extracting all the rest succeeds, the last field, exit_code,
// will get a value other than "fail":
[&](auto&&... rest) {
(..., (sp = std::from_chars(sp + (sp != end), end, rest).ptr));
}(pps.ppid /* (4) */, pps.pgrp /* (5) */, pps.session /* (6) */,
pps.tty_nr /* (7) */, pps.tpgid /* (8) */, pps.flags /* (9) */,
pps.minflt /* (10) */, pps.cminflt /* (11) */, pps.majflt /* (12) */,
pps.cmajflt /* (13) */, pps.utime /* (14) */, pps.stime /* (15) */,
pps.cutime /* (16) */, pps.cstime /* (17) */, pps.priority /* (18) */,
pps.nice /* (19) */, pps.num_threads /* (20) */,
pps.itrealvalue /* (21) */, pps.starttime /* (22) */,
pps.vsize /* (23) */, pps.rss /* (24) */, pps.rsslim /* (25) */,
pps.startcode /* (26) */, pps.endcode /* (27) */,
pps.startstack /* (28) */, pps.kstkesp /* (29) */,
pps.kstkeip /* (30) */, pps.signal /* (31) */, pps.blocked /* (32) */,
pps.sigignore /* (33) */, pps.sigcatch /* (34) */,
pps.wchan /* (35) */, pps.nswap /* (36) */, pps.cnswap /* (37) */,
pps.exit_signal /* (38) */, pps.processor /* (39) */,
pps.rt_priority /* (40) */, pps.policy /* (41) */,
pps.delayacct_blkio_ticks /* (42) */, pps.guest_time /* (43) */,
pps.cguest_time /* (44) */, pps.start_data /* (45) */,
pps.end_data /* (46) */, pps.start_brk /* (47) */,
pps.arg_start /* (48) */, pps.arg_end /* (49) */,
pps.env_start /* (50) */, pps.env_end /* (51) */,
pps.exit_code /* (52) */
);
return is;
}
};
Note: kernel_thread_comm_len
is a constant to deal with comm
fields longer than the 16
characters mentioned for the comm
field. Kernel tasks may be 64
characters, so that's what I set that constant to.
Then comes the part with for what processes to collect the information. If you have a std::vector
of process IDs, you could add a function that populates a std::vector<proc_pid_stat>
:
auto get_proc_pid_stats(std::ranges::random_access_range auto&& pids) {
static const std::filesystem::path proc("/proc");
std::vector<proc_pid_stat> ppss(std::ranges::size(pids));
auto zw = std::views::zip(pids, ppss);
auto fillfunc = [](auto&& pid_pps) {
auto& [pid, pps] = pid_pps;
auto path = proc / std::to_string(pid) / "stat";
std::ifstream is(path);
is >> pps;
};
std::for_each(std::execution::par, std::ranges::begin(zw),
std::ranges::end(zw), fillfunc);
return ppss;
}
Note: The above uses the built-in thread pool (if your implementation supports it). You may need to link with the library implementing it for it to be useful. -ltbb
is common. Should you for some reason don't want to use the thread pool, change std::execution::par
to std::execution::seq
and measure the difference in time.
If you want all the processes, you can make it more effective by not building the filename for every process file like I did in get_proc_pid_stats
above. Just collect the filenames and use those instead of pids
in the loop above:
std::vector<std::filesystem::path> pids;
for(auto& de : std::filesystem::directory_iterator("/proc")) {
if(std::isdigit(
static_cast<unsigned char>(de.path().filename().string().front())))
{
pids.emplace_back(de.path());
}
}