Search code examples
nixexecve

Running a binary directly works fine, but fails with execve


I have a very curious case that I spent a week trying to debug to no avail. For context, I want to start playing with the nix package manager and I made myself a tiny chroot environment from one of the Alpine base images. I have successfully installed the nix package manager as can be seen from:

# nix --version
nix (Nix) 2.9.2

I have executed the usual

# nix-channel --add "https://nixos.org/channels/nixpkgs-unstable"

And verified that it works successfully by running:

# nix-channel --list
nixpkgs https://nixos.org/channels/nixpkgs-unstable

However, when I am trying to run nix-channel --update, I am getting:

# nix-channel --update
unpacking channels...
error: executing '/usr/bin/nix-env': No such file or directory
error: program '/usr/bin/nix-env' failed with exit code 1

Okay, it says nix-env is not available, but:

# /usr/bin/nix-env --version
nix-env (Nix) 2.9.2
# type -p nix-env
/usr/bin/nix-env
# nix-env --version
nix-env (Nix) 2.9.2

It actually exists, so I started to dig deeper, and go my strace (relevant excerpt):

[pid 30861] vfork(strace: Process 30880 attached
 <unfinished ...>
[pid 30880] prctl(PR_SET_PDEATHSIG, SIGKILL) = 0
[pid 30880] dup2(12, 1)                 = 1
[pid 30880] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 30880] getcwd("/", 4096)           = 2
[pid 30880] setns(3, CLONE_NEWNS)       = 0
[pid 30880] chdir("/")                  = 0
[pid 30880] prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=65536*1024, rlim_max=RLIM64_INFINITY}) = 0
[pid 30880] prlimit64(0, RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}, NULL) = 0
[pid 30880] mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efc7fa67000
[pid 30880] execve("/usr/bin/nix-env", ["/usr/bin/nix-env", "--profile", "/nix/var/nix/profiles/per-user/r"..., "--file", "/tmp/nix.pmkgai", "--install", "--remove-all", "--from-expression", "f: f { name = \"nixpkgs\"; channel"..., "--quiet"], 0x7ffc38f7aac0 /* 11 vars */) = -1 ENOENT (No such file or directory)

Rather very strange. I got nix sources out of curiosity and added some debugging info around the execve invocation. Grabbing the command line that is executed works fine on the console when run manually. Replaced the call to /usr/bin/nix-env with (slightly modified version of) /bin/echo, which executed successfully and produced the result I expected (nix guys should really look at providing better runtime debugging information, because at the moment nix-env is execved with --quiet flag and the output is not even propagated back to the user, so it is not possible to know what happened, but that’s a GitHub issue for another day /rant off).

Anyway, I digress a bit. The question I have, what could be causing this? Why does nix-env work just fine from the command line, but fails under execve?

I did my usual checks on what interpreter is expected:

# readelf -l /usr/bin/nix-env | grep interpreter
      [Requesting program interpreter: /lib/ld-musl-x86_64.so.1]
# ls -la /lib/ld-musl-x86_64.so.1
-rwxr-xr-x    1 root     root        604704 Apr  8 05:38 /lib/ld-musl-x86_64.so.1

As well as making sure all dynamic libraries are properly resolved:

# ldd /usr/bin/nix-env 
    /lib/ld-musl-x86_64.so.1 (0x7fc0b3d20000)
    libnixexpr.so => /usr/lib/libnixexpr.so (0x7fc0b378a000)
    libgc.so.1 => /usr/lib/libgc.so.1 (0x7fc0b3721000)
    libnixmain.so => /usr/lib/libnixmain.so (0x7fc0b36d7000)
    libnixfetchers.so => /usr/lib/libnixfetchers.so (0x7fc0b35dd000)
    libnixstore.so => /usr/lib/libnixstore.so (0x7fc0b3229000)
    libnixutil.so => /usr/lib/libnixutil.so (0x7fc0b30eb000)
    libnixcmd.so => /usr/lib/libnixcmd.so (0x7fc0b3027000)
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x7fc0b2dd9000)
    libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7fc0b2dba000)
    libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7fc0b3d20000)
    libboost_context.so.1.80.0 => /usr/lib/libboost_context.so.1.80.0 (0x7fc0b2db5000)
    libsqlite3.so.0 => /usr/lib/libsqlite3.so.0 (0x7fc0b2cbd000)
    libcurl.so.4 => /usr/lib/libcurl.so.4 (0x7fc0b2c3f000)
    libsodium.so.23 => /usr/lib/libsodium.so.23 (0x7fc0b2bed000)
    libseccomp.so.2 => /usr/lib/libseccomp.so.2 (0x7fc0b2bd1000)
    libcrypto.so.3 => /usr/lib/libcrypto.so.3 (0x7fc0b2817000)
    libbrotlidec.so.1 => /usr/lib/libbrotlidec.so.1 (0x7fc0b280b000)
    libbrotlienc.so.1 => /usr/lib/libbrotlienc.so.1 (0x7fc0b2787000)
    libarchive.so.13 => /usr/lib/libarchive.so.13 (0x7fc0b26e0000)
    libcpuid.so.15 => /usr/lib/libcpuid.so.15 (0x7fc0b26c5000)
    libeditline.so.1 => /usr/lib/libeditline.so.1 (0x7fc0b26ba000)
    libnghttp2.so.14 => /usr/lib/libnghttp2.so.14 (0x7fc0b2691000)
    libssl.so.3 => /usr/lib/libssl.so.3 (0x7fc0b25fc000)
    libz.so.1 => /lib/libz.so.1 (0x7fc0b25e2000)
    libbrotlicommon.so.1 => /usr/lib/libbrotlicommon.so.1 (0x7fc0b25bf000)
    libacl.so.1 => /lib/libacl.so.1 (0x7fc0b25b5000)
    libexpat.so.1 => /usr/lib/libexpat.so.1 (0x7fc0b2590000)
    liblzma.so.5 => /usr/lib/liblzma.so.5 (0x7fc0b256d000)
    libzstd.so.1 => /usr/lib/libzstd.so.1 (0x7fc0b24f7000)
    liblz4.so.1 => /usr/lib/liblz4.so.1 (0x7fc0b24d8000)
    libbz2.so.1 => /usr/lib/libbz2.so.1 (0x7fc0b24c9000)

Any pointers in the right direction are highly appreciated.

EDIT: Adding a strace -v run to see what environment is passed down to the nix-env invocation (as suggested in one of the comments):

[pid 18800] execve("/usr/bin/nix-env", ["/usr/bin/nix-env", "--profile", "/nix/var/nix/profiles/per-user/r"..., "--file", "/tmp/nix.aCocaf", "--install", "--remove-all", "--from-expression", "f: f { name = \"nixpkgs\"; channel"..., "--quiet"], ["CHARSET=UTF-8", "PWD=/root", "HOME=/root", "LANG=C.UTF-8", "TMPDIR=/tmp", "SHLVL=1", "PAGER=less", "PS1=\\h:\\w\\$ ", "LC_COLLATE=C", "PATH=/nix/var/nix/profiles/defau"..., "OLDPWD=/root", "_=/usr/bin/strace"]) = -1 ENOENT (No such file or directory)

Unfortunately, no LD_* environment variables are manipulated or passed down to the nix-env call.

EDIT: Digging further, I built a very minimal statically-linked Rust app that writes the command line and the environment to the log file when it is executed, and replaced the nix-env with it. Getting the same problem.


Solution

  • You say that you use a chroot environment. Your strace log contains setns(CLONE_NEWNS), which means that nix uses mount namespaces. By looking at nix source code, it seems that when nix starts, it saves the current mount namespace and restores it in forked children by the setns call. Maybe, this way nix can escape from the chroot, to a place, where your binaries are not available. This would explain why neither nix-env nor your Rust static binary could have been executed, but /bin/echo could (provided that you have /bin/echo also outside of your chroot).