Search code examples
linuxlinux-kernelldelfmusl

Troubleshooting Execution Failure of the musl-libc Shared Library


Problem Description

I am currently working on a project involving shared library reconstruction (on a x64, Linux virtual machine), specifically targeting the musl-libc (verion: 1.1.15).

After reconstructing the musl-libc library, the new library fails to execute (Note that musl-libc itself is also a binary executable).

Specifically, when I run the original musl-libc with:

./libc.so

I expect the following output:

musl libc (x86_64)
Version 1.1.15
Dynamic Program Loader
Usage: ./libc.so [options] [--] pathname [args]

However, running my reconstructed version results in:

bash: ./libc.so.rewritten: cannot execute binary file: Exec format error

Initial Analysis

My goal is to diagnose the problem that causes execution failure. I've checked the readelf -aoutput of my reconstructed library and everything looks just "fine". So further analysis on the loading process is necessary to locate the problem.

As of now, I understand that a binary executable typically contains an .interp section to specify the path of the dynamic loader. However, musl-libc does not come with an .interp section. Here’s the output of readelf -lW ./libc.so for reference:

Elf file type is DYN (Shared object file)
Entry point 0x5d9c4
There are 7 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000000040 0x0000000000000040 0x000188 0x000188 R   0x8
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x08eda4 0x08eda4 R E 0x1000
  LOAD           0x08f980 0x0000000000090980 0x0000000000090980 0x000a70 0x003954 RW  0x1000
  DYNAMIC        0x08fe18 0x0000000000090e18 0x0000000000090e18 0x000130 0x000130 RW  0x8
  GNU_EH_FRAME   0x08ed80 0x000000000008ed80 0x000000000008ed80 0x000024 0x000024 R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  GNU_RELRO      0x08f980 0x0000000000090980 0x0000000000090980 0x000680 0x000680 RW  0x10
...

Questions

  • Dynamic Loader Issue: Could the issue be related to the dynamic loader? If so, which dynamic loader is used to load musl-libc? Does it use the glibc dynamic loader or the musl-libc dynamic loader (i.e., musl-libc itself)?
  • OS Loading Mechanism: Alternatively, does the OS load musl-libc directly via execve? If this is the case, would adding logs in a specific kernel function (responsible for loading ELF binaries) help in diagnosing the issue?

Any advice or suggestions would be greatly appreciated. Thank you in advance!

Edit: Below is the readelf -h output for libc.so.rewritten.

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x191c1
  Start of program headers:          64 (bytes into file)
  Start of section headers:          867576 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         79
  Size of section headers:           64 (bytes)
  Number of section headers:         92
  Section header string table index: 91

Solution

  • Before loading the elf binary, the kernel performs some sanity checks (Below is the source code copied from function load_elf_phdrs() in fs/binfmt_elf.c):

    /* Sanity check the number of program headers... */
    /* ...and their total size. */
    size = sizeof(struct elf_phdr) * elf_ex->e_phnum;
    if (size == 0 || size > 65536 || size > ELF_MIN_ALIGN)
        goto out;
    

    Here, it assumes the size of the program headers is less than ELF_MIN_ALIGN. Therefore, it refuses to execute my reconstructed library because its program header exceeds this length.