c++ system() raises ENOMEM

This question is a M(not)WE of this question. I wrote a code that reproduces the error:

#include <cstdlib>
#include <iostream>
#include <vector>

int *watch_errno = __errno_location();

int main(){
    std::vector<double> a(7e8,1);  // allocate a big chunk of memory
    std::cout<<std::system(NULL)<<std::endl;
}

It has to be compiled with g++ -ggdb -std=c++11 (g++ 4.9 on a Debian). Note that the int *watch_errno is useful only to allow gdb to watch errno.

When it is run under gdb, I get this :

(gdb) watch *watch_errno 
Hardware watchpoint 1: *watch_errno
(gdb) r
Starting program: /tmp/bug 
Hardware watchpoint 1: *watch_errno

Old value = <unreadable>
New value = 0
__static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at bug.cpp:10
10      }
(gdb) c
Continuing.
Hardware watchpoint 1: *watch_errno

Old value = 0
New value = 12
0x00007ffff7252421 in do_system (line=line@entry=0x7ffff7372168 "exit 0") at ../sysdeps/posix/system.c:116
116     ../sysdeps/posix/system.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7252421 in do_system (line=line@entry=0x7ffff7372168 "exit 0") at ../sysdeps/posix/system.c:116
#1  0x00007ffff7252510 in __libc_system (line=<optimized out>) at ../sysdeps/posix/system.c:182
#2  0x0000000000400ad8 in main () at bug.cpp:9
(gdb) l
111     in ../sysdeps/posix/system.c
(gdb) c
Continuing.
0
[Inferior 1 (process 5210) exited normally]

For some reason errno is set to ENOMEM at line 9 which corresponds to the system() call. Note that if the vector has a smaller size (I guess that it depends on which computer you'll run the code), the code works fine and system(NULL) returns 1 as it should when a shell is available.

Why is the flag ENOMEM raised? Why isn't the code using the swap memory? Is this a bug? Is there a workaround? Would popen or exec* do the same? (I know, I should only ask one question per post, but all these question could be summarized by, "what is going on?")

As requested, here is the result of ulimit -a:

-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         0
-m: resident set size (kbytes)      unlimited
-u: processes                       30852
-n: file descriptors                65536
-l: locked-in-memory size (kbytes)  64
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 30852
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15:                              unlimited

and here the relevant part of strace -f myprog

mmap(NULL, 5600002048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faa98562000
rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7fabe622b180}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7fabe622b180}, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fff8797635c) = -1 ENOMEM (Cannot allocate memory)
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fabe622b180}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7fabe622b180}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fabe6fde000
write(1, "0\n", 20
)                      = 2
write(1, "8\n", 28
)                      = 2
munmap(0x7faa98562000, 5600002048)      = 0

here is the output of free:

           total       used       free     shared    buffers     cached
Mem:       7915060    1668928    6246132      49576      34668    1135612
-/+ buffers/cache:     498648    7416412
Swap:      2928636          0    2928636

Solution

The system() function works by first creating a new copy of the process with fork() or similar (in Linux, this ends up in the clone() system call, as you show) and then, in the child process, calling exec to create a shell running the desired command.

The fork() call can fail if there is insufficient virtual memory for the new process (even though you intend to immediately replace it with a much smaller footprint, the kernel can't know that). Some systems allow you to trade the ability to fork large processes for reduced guarantees that page faults may fail, with copy-on-write (vfork()) or memory overcommit (/proc/sys/vm/overcommit_memory and /proc/sys/vm/overcommit_ratio).

Note that the above applies equally to any library function that may create new processes - e.g. popen(). Though not exec(), as that replaces the process and doesn't clone it.

If the provided mechanisms are inadequate for your use case, then you may need to implement your own system() replacement. I recommend starting a child process early on (before you allocate lots of memory) whose sole job is to accept NUL-separated command lines on stdin and report exit status on stdout.

An outline of the latter solution in pseudo-code looks something like:

int request_fd[2];
int reply_fd[2];

pipe(request_fd);
pipe(reply_fd);

if (fork()) {
    /* in parent */
    close(request_fd[0]);
    close(reply_fd[1]);
} else {
    /* in child */
    close(request_fd[1]);
    close(reply_fd[0]);
    while (read(request_fd[0], command)) {
        int result = system(command);
        write(reply_fd[1], result);
    }
    exit();
}

// Important: don't allocate until after the fork()
std::vector<double> a(7e8,1);  // allocate a big chunk of memory

int my_system_replacement(const char* command) {
    write(request_fd[1], command);
    read(reply_fd[0], result);
    return result;
}

You'll want to add appropriate error checks throughout, by reference to the man pages. And you might want to make it more object-oriented, and perhaps use iostreams for your read and write operations, etc.