I am trying to sandbox ELF binaries by (among other things) chrooting them after they have been launched. To do so, a child process cloned with the CLONE_FS tag performs a chroot, while the parent runs the binary by calling an exec function.
The trick actually works if the chroot happens after the program has finished loading the shared libraries it needs. The problem is that I can't find a way to detect when this actually happens from the other process. Is there any way?
You can use a preload library with a function executed just prior to main()
, a helper binary with CAP_SYS_CHROOT
permitted filesystem capability, and an Unix domain socket pair between the two.
The helper binary creates the socket pair, then uses clone(CLONE_FS)
to fork a helper process that shares the file system information, sets LD_PRELOAD
to load the preload library, and executes the sandboxed binary. (exec
resets the capabilities per the sandboxed binary filesystem capabilities, so the sandboxed binary will not have any extra privileges at all.)
The helper process adds CAP_SYS_CHROOT
to the effective set, waits for the sandboxed binary (preload library) to notify it via the socket, calls chroot()
, and notifies the sandboxed binary (preload library) of success.
Note: There is absolutely no need to mark the helper binary setuid root, or to give the sandboxed binary any capabilities or privileges. We can do this with minimal privileges: CAP_SYS_CHROOT
capability is sufficient.
I prefer to add the capability to the binary only into the permitted set, so that the binary itself has to add the capability to the effective set before chroot()
works. I feel this approach reduces the effects of possible installation/administrator errors. If you disagree, feel free to omit the corresponding code from exec.c
, and use =pe
instead of =p
in the setcap
command in Makefile.
The neat thing here is that the preload library could also interpose desired C functions, and use the unix domain socket to obtain the necessary information from the helper process; you can even use SCM_RIGHTS
ancillary messages to transfer file descriptors from outside the chroot to the sandboxed binary. (In essence, this is what fakeroot
does, but in reverse: instead of faking a chrooted environment, you can pick and choose which files the sandboxed binary can access from outside the chroot environment.) Just have the helper process stay alive as long as the other end of the socket is still open, so it'll exit after the sandboxed binary exits.
Here is my example implementation that starts the helper process as a child process to the sandboxed binary, with the helper process exiting (and preload library reaping it) before the sandboxed main()
is started.
exec.c:
#define _GNU_SOURCE
#define _POSIX_C_SOURCE 200809L
#include <unistd.h>
#include <sys/capability.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/mman.h>
#include <sched.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
#ifndef SOCKET_FD
#error SOCKET_FD not defined!
#endif
#ifndef LIBRARY_PATH
#error LIBRARY_PATH not defined!
#endif
static size_t helper_stack_size = 32768;
static void *helper_stack = NULL;
static const char *helper_chroot = NULL;
static const cap_value_t helper_cap[] = { CAP_SYS_CHROOT };
static const int helper_caps = sizeof helper_cap / sizeof helper_cap[0];
static int socket_fd[2] = { -1, -1 };
#ifdef __hppa
#define helper_endstack (helper_stack)
#else
#define helper_endstack ((void *)((char *)helper_stack + helper_stack_size - 1))
#endif
static int helper_main(void *arg)
{
const char *const argv0 = arg;
pid_t pid;
cap_t caps;
close(socket_fd[0]);
/* Read the target PID. */
{ char *p = (char *)(&pid);
char *const q = (char *)(&pid) + sizeof pid;
ssize_t n;
while (p < q) {
n = recv(socket_fd[1], p, (size_t)(q - p), MSG_WAITALL);
if (n > (ssize_t)0)
p += n;
else
if (n != (ssize_t)-1) {
fprintf(stderr, "%s: %s.\n", argv0, strerror(EIO));
return 127;
} else
if (errno != EINTR) {
fprintf(stderr, "%s: %s.\n", argv0, strerror(errno));
return 127;
}
}
}
if (pid < (pid_t)2) {
shutdown(socket_fd[1], SHUT_RDWR);
close(socket_fd[1]);
return 127;
}
/* Enable CAP_SYS_CHROOT. */
caps = cap_get_proc();
if (cap_set_flag(caps, CAP_EFFECTIVE, helper_caps, helper_cap, CAP_SET)) {
shutdown(socket_fd[1], SHUT_RDWR);
close(socket_fd[1]);
fprintf(stderr, "%s: %s.\n", argv0, strerror(errno));
return 127;
}
if (cap_set_proc(caps)) {
shutdown(socket_fd[1], SHUT_RDWR);
close(socket_fd[1]);
fprintf(stderr, "%s: %s.\n", argv0, strerror(errno));
return 127;
}
/* Target is ready to be chrooted, so do it now. */
if (chroot(helper_chroot)) {
shutdown(socket_fd[1], SHUT_RDWR);
close(socket_fd[1]);
fprintf(stderr, "%s: Cannot chroot: %s.\n", argv0, strerror(errno));
return 127;
}
/* Send my own pid, so this process will be reaped. */
{ const char *p = (char *)(&pid);
const char *const q = (char *)(&pid) + sizeof pid;
ssize_t n;
pid = getpid();
while (p < q) {
n = send(socket_fd[1], p, (size_t)(q - p), MSG_NOSIGNAL);
if (n > (ssize_t)0)
p += n;
else
if (n != (ssize_t)-1) {
fprintf(stderr, "%s: %s.\n", argv0, strerror(EIO));
return 127;
} else
if (errno != EINTR) {
fprintf(stderr, "%s: %s.\n", argv0, strerror(errno));
return 127;
}
}
}
/* We won't be sending anything else. */
shutdown(socket_fd[1], SHUT_WR);
/* Ignore further input; wait for other end to close descriptor. */
{ char buf[16];
ssize_t n;
while (1) {
n = recv(socket_fd[1], buf, sizeof buf, 0);
if (n > (ssize_t)0)
continue;
else
if (n == (ssize_t)0)
break;
else
if (n != (ssize_t)-1) {
fprintf(stderr, "%s: %s.\n", argv0, strerror(EIO));
return 127;
} else
if (errno == EPIPE)
break;
else
if (errno != EINTR) {
fprintf(stderr, "%s: %s.\n", argv0, strerror(errno));
return 127;
}
}
}
/* Close the socket, and exit. */
shutdown(socket_fd[1], SHUT_RDWR);
close(socket_fd[1]);
return 0;
}
int main(int argc, char *argv[])
{
if (argc < 4 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
fprintf(stderr, " %s CHROOT WORKDIR COMMAND [ ARGS ... ]\n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "Note: . is a valid WORKDIR.\n");
fprintf(stderr, "\n");
return 1;
}
if (chdir(argv[2])) {
fprintf(stderr, "%s: %s.\n", argv[2], strerror(errno));
return 1;
}
helper_stack = mmap(NULL, helper_stack_size, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE | MAP_STACK | MAP_GROWSDOWN, -1, (off_t)0);
if ((void *)helper_stack == MAP_FAILED) {
fprintf(stderr, "Cannot create helper process stack: %s.\n", strerror(errno));
return 1;
}
helper_chroot = argv[1];
if (socketpair(AF_UNIX, SOCK_STREAM, 0, socket_fd)) {
fprintf(stderr, "Cannot create an Unix domain stream socket pair: %s.\n", strerror(errno));
return 1;
}
if (clone(helper_main, helper_endstack, CLONE_FS, argv[0]) == -1) {
fprintf(stderr, "Cannot clone a helper process: %s.\n", strerror(errno));
close(socket_fd[0]);
close(socket_fd[1]);
return 1;
}
close(socket_fd[1]);
if (socket_fd[0] != SOCKET_FD) {
if (dup2(socket_fd[0], SOCKET_FD) == -1) {
fprintf(stderr, "Cannot move stream socket: %s.\n", strerror(errno));
close(socket_fd[0]);
close(SOCKET_FD);
return 1;
}
close(socket_fd[0]);
}
setenv("LD_PRELOAD", LIBRARY_PATH, 1);
/* Capabilities are reset over an execve(). */
execvp(argv[3], argv + 3);
close(SOCKET_FD);
fprintf(stderr, "%s: %s.\n", argv[3], strerror(errno));
return 1;
}
premain.c:
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
#ifndef SOCKET_FD
#error SOCKET_FD is not defined!
#endif
static void init(void) __attribute__ ((constructor (65535)));
static void init(void)
{
pid_t pid;
/* Note: We could probably only remove libpremain.so
* from the value, instead of clearing it altogether. */
unsetenv("LD_PRELOAD");
/* Verify SOCKFD is an Unix domain socket. */
{ struct sockaddr_un addr;
socklen_t addrlen = sizeof addr;
memset(&addr, 0, sizeof addr);
errno = EIO;
if (getsockname(SOCKET_FD, (struct sockaddr *)&addr, &addrlen))
switch (errno) {
case EBADF:
/* SOCKET_FD is not open. Continue as if libpremain.so was never loaded. */
errno = 0;
return;
case ENOTSOCK:
/* SOCKET_FD is not a socket. Continue as if libpremain.so was never loaded. */
errno = 0;
return;
default:
/* All other errors are fatal. */
exit(127);
}
if (addr.sun_family != AF_UNIX) {
/* SOCKET_FD is not an Unix domain socket. Continue as if libpremain.so was never loaded. */
errno = 0;
return;
}
}
/* Make SOCKET_FD blocking and close-on-exec. */
if (fcntl(SOCKET_FD, F_SETFD, (long)FD_CLOEXEC) ||
fcntl(SOCKET_FD, F_SETFL, (long)0L))
exit(127);
/* Send our PID. */
{ const char *p = (const char *)(&pid);
const char *const q = (const char *)(&pid) + sizeof pid;
pid = getpid();
while (p < q) {
ssize_t n = send(SOCKET_FD, p, (size_t)(q - p), MSG_NOSIGNAL);
if (n > (ssize_t)0)
p += n;
else
if (n != (ssize_t)-1)
exit(127);
else
if (errno != EINTR)
exit(127);
}
}
/* Receive the PID from the other end. */
{ char *p = (char *)(&pid);
char *const q = (char *)(&pid) + sizeof pid;
pid = (pid_t)-1;
while (p < q) {
ssize_t n = recv(SOCKET_FD, p, (size_t)(q - p), MSG_WAITALL);
if (n > (ssize_t)0)
p += n;
else
if (n != (ssize_t)-1)
exit(127);
else
if (errno != EINTR)
exit(127);
}
}
shutdown(SOCKET_FD, SHUT_RDWR);
close(SOCKET_FD);
/* If the PID is > 1, we wait for it to exit.
* If an error occurs, it's not a problem. */
if (pid > (pid_t)1) {
pid_t p;
do {
p = waitpid(pid, NULL, 0);
} while (p == (pid_t)-1 && errno == EINTR);
}
/* All done. */
return;
}
Makefile:
CC := gcc
CFLAGS := -Wall -O3
LD := $(CC)
LDFLAGS := -lcap
PREFIX := /usr
BINDIR := $(PREFIX)/bin
LIBDIR := $(PREFIX)/lib
SOCKFD := 15
.PHONY: all clean
all: clean libpremain.so exec-chroot
clean:
rm -f libpremain.so exec-chroot
libpremain.so: premain.c
$(CC) $(CFLAGS) -DSOCKET_FD=$(SOCKFD) -fPIC -shared $^ -ldl -Wl,-soname,$@ $(LDFLAGS) -o $@
exec-chroot: exec.c
$(CC) $(CFLAGS) -DSOCKET_FD=$(SOCKFD) -DLIBRARY_PATH='"'$(LIBDIR)/libpremain.so'"' $^ $(LDFLAGS) -o $@
install: libpremain.so exec-chroot
sudo rm -f $(LIBDIR)/libpremain.so $(BINDIR)/exec-chroot
sudo install -o `id -un` -g `id -gn` -m 00770 libpremain.so $(LIBDIR)/libpremain.so
sudo install -o `id -un` -g `id -gn` -m 00770 exec-chroot $(BINDIR)/exec-chroot
sudo setcap 'cap_sys_chroot=p' $(BINDIR)/exec-chroot
uninstall:
sudo rm -f $(LIBDIR)/libpremain.so $(BINDIR)/exec-chroot
Note that the indentation in the Makefile is with tabs, not spaces. Run
make PREFIX=/usr/local clean install
to compile and install to /usr/local
, but only executable by the current user. You can also use clean all
to only recompile everything, or uninstall
to uninstall the binaries.`
This does require the libcap
library. It is maintained as part of the kernel, but you might need to install a libcap-dev
or libcap-devel
or similarly-named package to get all the necessary files to compile against it.
After installing, you can run e.g.
exec-chroot /tmp /tmp ls -alF /
to run ls -alF /
in /tmp
chrooted to /tmp
. The output on Ubuntu machines is typically something like
drwxrwxrwt 11 0 0 4096 May 29 23:55 ./
drwxrwxrwt 11 0 0 4096 May 29 23:55 ../
drwxrwxrwt 2 0 0 4096 May 29 17:15 .ICE-unix/
-r--r--r-- 1 0 0 11 May 29 17:15 .X0-lock
drwxrwxrwt 2 0 0 4096 May 29 17:15 .X11-unix/
drwx------ 2 1000 1000 4096 May 29 17:15 .esd-1000/
drwx------ 2 0 0 16384 Dec 2 2011 lost+found/
drwx------ 2 1000 1000 4096 May 29 17:15 pulse-xxxxxxxxx/
drwx------ 2 0 0 4096 May 29 17:15 pulse-yyyyyyyyy/
where the owner and group are 0 (root) and 1000 (user), respectively, because the passwd and group databases are inaccessible from within the chroot. However, as I already mentioned, it can be worked around by modifying and extending the above code as outlined.
Although I did try to write the code with careful error handling, I have not really considered the overall operation thoroughly with respect to error conditions or security issues; that is why the files are installed only accessible to the current user.
Questions?