Optimal Way of Using mlockall() for Real-time Application (nanosecond sensitive)

I am reading mlockall()'s manpage: http://man7.org/linux/man-pages/man2/mlock.2.html

It mentions

Real-time processes that are using mlockall() to prevent delays on page 
faults should reserve enough locked stack pages before entering the time-
critical section, so that no page fault can be caused by function calls.  This 
can be achieved by calling a function that allocates a sufficiently large 
automatic variable (an array) and writes to the memory occupied by this array in 
order to touch these stack pages.  This way, enough pages will be mapped for the 
stack and can be locked into RAM.  The dummy writes ensure that not even copy-
on-write page faults can occur in the critical section.

I am a bit confused by this statement:

This can be achieved by calling a function that allocates a sufficiently large 
automatic variable (an array) and writes to the memory occupied by this array in 
order to touch these stack pages.

All the automatic variables (variables on stack) are created "on the fly" on the stack when the function is called. So how can I achieve what the last statement says?

For example, let's say I have this function:

void foo() {
char a;
uint16_t b;
std::deque<int64_t> c;
// do something with those variables
}

Or does it mean before I call any function, I should call a function like this in main():

void reserveStackPages() {
int64_t stackPage[4096/8 * 1024 * 1024];
memset(stackPage, 0, sizeof(stackPage));
}

If yes, does it make a difference if I first allocate the stackPage variable on heap, write and then free? Probably yes, because heap and stack are 2 different region in the RAM?

std::deque exists above is just to bring up another related question -- what if I want to reserve memory for things using both stack pages and heap pages. Will calling "heap" version of reserveStackPages() help?

The goal is to minimize all the jitters in the application (yes, I know there are many other things to look at such as TLB miss, etc; just trying to deal with one kind of jitter at once, and slowly progressing into all).

Thanks in advance.

P.S. This is for a low latency trading application if it matters.

Solution

You generally don't need to use mlockall, unless you code (more or less hard) real-time applications (I actually never used it).

If you do need it, you'll better code in C (not in genuine C++) the most real-time parts of your code, because you surely want to understand the details of memory allocation. Notice that unless you dive into std::deque implementation, you don't exactly know where it is sitting (probably most of the data is heap allocated, even if your c is an automatic variable).

You should first understand in details the virtual address space of your process. For that, proc(5) is useful: from inside your process, you'll read /proc/self/maps (see this), from outside (e.g. some terminal) you'll do cat /proc/1234/maps for a process of pid 1234. Or use pmap(1).

because heap and stack are 2 different regions in the RAM?

In fact, your process' address space contains many segments (listed in /proc/1234/maps), much more that two. Typically every dynamically linked shared library (such as libc.so) brings a few segments.

Try cat /proc/self/maps and cat /proc/$$/maps in your terminal to get a better intuition about virtual address spaces. On my machine, the first gives 19 segments of the cat process -each displayed as a line- and the second 97 segments of the zsh (my shell) process.

To ensure that your stack has enough space, you indeed could call a function allocating a large enough automatic variable, like your reserveStackPages. Beware that call stacks are practically of limited size (a few megabytes usually, see also setrlimit(2)).

If you really need mlockall (which is unlikely) you might consider linking statically your program (to have less segments in your virtual address space).

Look also into madvise(2) (and perhaps mincore(2)). It is generally much more useful than mlockall. BTW, in practice, most of your virtual memory is in RAM (unless your system experiments thrashing, and then you'll see it immediately).

Read also Operating Systems: Three Easy Pieces to understand the role of paging.

PS. Nano-second sensitive applications does not make much sense (because of cache misses that the software does not control).