Search code examples
cgccldelf

Understanding ELF Binary Size for nostdlib C Program


I'm on Ubuntu 20.04, gcc 9.3.0, ld 2.34. I have a simple hello world program that does not use glibc or any other library and just uses write syscall. Despite this, my binary size is roughly 8Kb. I'm unsure as to why it is that large and not say 1Kb.

C Program:

int
x64_syscall_write(int fd, char const *data, unsigned long int data_size)
{
  int result = 0;
  __asm__ __volatile__("syscall"
              : "=a" (result)
              : "a" (1), "D" (fd),
                "S" (data), "d" (data_size)
              : "r11", "rcx", "memory");
  return result;
}

__asm__(".global entry_point\n"
  "entry_point:\n"
  "xor rbp, rbp\n"
  "pop rdi\n"
  "mov rsi, rsp\n"
  "and rsp, 0xfffffffffffffff0\n"
  "call main\n"
  "mov rdi, rax\n"
  "mov rax, 60\n"
  "syscall\n"
  "ret");

int
main(int argc, char *argv[])
{
  x64_syscall_write(1, "hello\n", 6); 
  return 0;
}

Built with:

gcc -ffreestanding -static -nostdlib -no-pie -masm=intel \
-fno-unwind-tables -fno-asynchronous-unwind-tables \
-Wl,--gc-sections -fdata-sections -Os \
hello.c -c -o hello.o

# NOTE: I know more could be done here to shave 
# off a few more bytes, but I feel this is the bulk of it.

ld -e entry_point hello.o -o hello

hello.o is 1.7Kb. hello is 8.4Kb.


Solution

  • readelf -Wl hello
    
    Elf file type is EXEC (Executable file)
    Entry point 0x40101c
    There are 6 program headers, starting at offset 64
    
    Program Headers:
      Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
      LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x0001b0 0x0001b0 R   0x1000
      LOAD           0x001000 0x0000000000401000 0x0000000000401000 0x000045 0x000045 R E 0x1000
      LOAD           0x002000 0x0000000000402000 0x0000000000402000 0x000007 0x000007 R   0x1000
      NOTE           0x000190 0x0000000000400190 0x0000000000400190 0x000020 0x000020 R   0x8
      GNU_PROPERTY   0x000190 0x0000000000400190 0x0000000000400190 0x000020 0x000020 R   0x8
      GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
    
     Section to Segment mapping:
      Segment Sections...
       00     .note.gnu.property
       01     .text
       02     .rodata
       03     .note.gnu.property
       04     .note.gnu.property
       05
    

    Here you can see that the linker created 3 LOAD segments: one for the ELF header and other metadata, one for .text and one for .rodata.

    Linking with -z noseparate-code results in much smaller binary (smaller than hello.o):

     ls -l hello*
    -rwxr-xr-x 1 user user 1384 Apr 26 22:24 hello
    -rw-r--r-- 1 user user  603 Apr 26 22:22 hello.c
    -rw-r--r-- 1 user user 1680 Apr 26 22:22 hello.o
    
    readelf -Wl hello
    
    Elf file type is EXEC (Executable file)
    Entry point 0x40015c
    There are 4 program headers, starting at offset 64
    
    Program Headers:
      Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
      LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x00018c 0x00018c R E 0x1000
      NOTE           0x000120 0x0000000000400120 0x0000000000400120 0x000020 0x000020 R   0x8
      GNU_PROPERTY   0x000120 0x0000000000400120 0x0000000000400120 0x000020 0x000020 R   0x8
      GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
    
     Section to Segment mapping:
      Segment Sections...
       00     .note.gnu.property .text .rodata
       01     .note.gnu.property
       02     .note.gnu.property
       03
    

    You can shrink this further by removing .note.GNU-stack and .note.gnu.property sections:

    objcopy -R .note.* hello.o hello1.o
    ld -e entry_point hello1.o -o hello1 -z noseparate-code
    
    ls -l hello1*
    -rwxr-xr-x 1 user user 1072 Apr 26 22:38 hello1
    -rw-r--r-- 1 user user 1440 Apr 26 22:37 hello1.o
    
    readelf -Wl hello1
    
    Elf file type is EXEC (Executable file)
    Entry point 0x400094
    There is 1 program header, starting at offset 64
    
    Program Headers:
      Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
      LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x0000c4 0x0000c4 R E 0x1000
    
     Section to Segment mapping:
      Segment Sections...
       00     .text .rodata