I'm trying to use the clone()
function in C, and am uncertain of how the second argument works. Per the clone()
man page:
The child_stack argument specifies the location of the stack used by the
child process. Since the child and calling process may share memory, it
is not possible for the child process to execute in the same stack as the
calling process. The calling process must therefore set up memory space
for the child stack and pass a pointer to this space to clone(). Stacks
grow downwards on all processors that run Linux (except the HP PA proces‐
sors), so child_stack usually points to the topmost address of the memory
space set up for the child stack.
After following suggestions in the comments on this article, I've been able to get a simple example working using this C program:
#include <stdio.h>
#include <sched.h>
#include <stdlib.h>
#include <assert.h>
#define SIZE 65536
int v1;
int run(void *arg) {
v1 = 42;
return 0;
}
int main(int argc, char **argv) {
void **child_stack;
int pid, rc, status;
v1 = 10;
child_stack = (void **) malloc(SIZE);
assert(child_stack != NULL);
printf("v1 before: %d\n", v1);
pid = clone(run, child_stack + SIZE/sizeof(void **), CLONE_VM, NULL);
//pid = clone(run, child_stack + SIZE, CLONE_VM, NULL);
assert(pid != -1);
status = 0;
rc = waitpid(pid, &status, __WALL);
assert(rc != -1);
assert(WEXITSTATUS(status) == 0);
printf("v1 after: %d\n", v1);
return 0;
}
But I'm confused as to why the particular pointer arithmetic in the clone
line is necessary. Given that according to the clone
docs the stack is supposed to grow downward, I see why you should add a value to the pointer returned by malloc
before passing it in. But I'd expect that you'd add the number of bytes malloc'd, instead of that value divided by 8 (on a 64-bit system), which is what seems to actually work. The code above seems to work fine regardless of what I define SIZE
as, but if I use the commented version instead, which is what I'd expect to work, I get a segmentation fault for all SIZE values above a certain threshold.
So, anyone understand why the given clone
line works, but the commented one doesn't?
As for why I'm using clone
to begin with, instead of fork
or pthreads, I'm trying to use some of its advanced sandboxing features to prevent an untrusted process from breaking out of a chroot jail, as described here.
With pointer arithmetic, the size of the type pointed to is incorporated when determining the actual memory offset, take for example:
int a[2] = {1, 2};
int* p = a;
printf("%x: %x\n", &a[0], p);
printf("%x: %x\n", &a[1], p + 1);
In this case, the value of p
isn't just address of p
+ 1, it's the value of p + 1 * sizeof(int)
(the size of the type pointed to). To account for this, when you want to offset some number of bytes, you need to divide the offset by the size of the pointer type you're modifying. In your case, the type you're pointing to is void*
, so it may be more accurate to say:
pid = clone(run, child_stack + SIZE/sizeof(void *), CLONE_VM, NULL);
You can visualize this behavior with something like:
int SIZE = 65536;
void** child_stack = (void **) malloc(SIZE);
void** child_stack_end = child_stack + SIZE;
void** child_stack_end2 = child_stack + SIZE / sizeof(*child_stack);
printf("%d\n", (intptr_t)child_stack_end - (intptr_t)child_stack); // "262144"
printf("%d\n", (intptr_t)child_stack_end2 - (intptr_t)child_stack); // "65536"