Search code examples
clinuxx86elfstrip

Investigating the size of an extremely small C program


I'm investigating the size of an extremely small C program on Linux (ubuntu 20.04).

I'm compiling as follows:

gcc -s -nostdlib test.c -o test

the following progam:

__attribute__((naked))
void _start() {
    asm("movl $1,%eax;"
    "xorl %ebx,%ebx;"
    "int  $0x80");
}

Basically the idea is to make the Linux system call to exit rather than depending on the C runtime to do that for us. (which would be the case in void main() { }). The program moves 1 into register EAX, clears register EBX (which would otherwise contain the return value), and then executes the linux system call interrupt 0x80. This interrupt triggers the kernel to process our call.

I would expect this program to be extremely small (less than 1K), however ..

du -h test
# >> 16K
ldd test
# >> statically linked

Why is this program still 16K?


Solution

  • du reports the disk space used by a file whereas ls reports the actual size of a file. Typically the size reported by du is significantly larger for small files.

    You can significantly reduce the size of the binary by changing compile and linking options and stripping out unnecessary sections.

    $ cat test.c
    void _start() {
        asm("movl $1,%eax;"
        "xorl %ebx,%ebx;"
        "int  $0x80");
    }
    
    $ gcc -s -nostdlib test.c -o test
    $ ./test
    $ ls -l test
    -rwxrwxr-x 1 fpm fpm 8840 Dec  9 04:09 test
    
    $ readelf -W --section-headers test
    There are 7 section headers, starting at offset 0x20c8:
    
    Section Headers:
      [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
      [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
      [ 1] .note.gnu.build-id NOTE            0000000000400190 000190 000024 00   A  0   0  4
      [ 2] .text             PROGBITS        0000000000401000 001000 000010 00  AX  0   0  1
      [ 3] .eh_frame_hdr     PROGBITS        0000000000402000 002000 000014 00   A  0   0  4
      [ 4] .eh_frame         PROGBITS        0000000000402018 002018 000038 00   A  0   0  8
      [ 5] .comment          PROGBITS        0000000000000000 002050 00002e 01  MS  0   0  1
      [ 6] .shstrtab         STRTAB          0000000000000000 00207e 000045 00      0   0  1
    Key to Flags:
      W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
      L (link order), O (extra OS processing required), G (group), T (TLS),
      C (compressed), x (unknown), o (OS specific), E (exclude),
      l (large), p (processor specific)
    $
    
    $ gcc -s -nostdlib -Wl,--nmagic test.c -o test
    $ ls -l test
    -rwxrwxr-x 1 fpm fpm 984 Dec  9 16:55 test
    $ strip -R .comment -R .note.gnu.build-id test
    $ strip -R .eh_frame_hdr -R .eh_frame test
    $ ls -l test
    -rwxrwxr-x 1 fpm fpm 520 Dec  9 17:03 test
    $ 
    

    Note that clang can produce a significantly smaller binary than gcc by default in this particular instance. However, after compiling with clang and stripping unnecessary sections, the final size of the binary is 736 bytes, which is bigger than the 520 bytes possible with gcc -s -nostdlib -Wl,--nmagic test.c -o test.

    $ clang -static -nostdlib -flto -fuse-ld=lld -o test test.c
    $ ls -l test
    -rwxrwxr-x 1 fpm fpm 1344 Dec  9 04:15 test
    $
    
    $ readelf -W --section-headers test
    There are 9 section headers, starting at offset 0x300:
    
    Section Headers:
      [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
      [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
      [ 1] .note.gnu.build-id NOTE            0000000000200190 000190 000018 00   A  0   0  4
      [ 2] .eh_frame_hdr     PROGBITS        00000000002001a8 0001a8 000014 00   A  0   0  4
      [ 3] .eh_frame         PROGBITS        00000000002001c0 0001c0 00003c 00   A  0   0  8
      [ 4] .text             PROGBITS        0000000000201200 000200 00000f 00  AX  0   0 16
      [ 5] .comment          PROGBITS        0000000000000000 00020f 000040 01  MS  0   0  1
      [ 6] .symtab           SYMTAB          0000000000000000 000250 000048 18      8   2  8
      [ 7] .shstrtab         STRTAB          0000000000000000 000298 000055 00      0   0  1
      [ 8] .strtab           STRTAB          0000000000000000 0002ed 000012 00      0   0  1
    Key to Flags:
      W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
      L (link order), O (extra OS processing required), G (group), T (TLS),
      C (compressed), x (unknown), o (OS specific), E (exclude),
      l (large), p (processor specific)
    $ 
      
    $ strip -R .eh_frame_hdr -R .eh_frame test
    $ strip -R .comment -R .note.gnu.build-id test
    strip: test: warning: empty loadable segment detected at vaddr=0x200000, is this intentional?
    $ ls -l test
    -rwxrwxr-x 1 fpm fpm 736 Dec  9 04:19 test
    $ readelf -W --section-headers test
    There are 3 section headers, starting at offset 0x220:
    
    Section Headers:
      [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
      [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
      [ 1] .text             PROGBITS        0000000000201200 000200 00000f 00  AX  0   0 16
      [ 2] .shstrtab         STRTAB          0000000000000000 00020f 000011 00      0   0  1
    Key to Flags:
      W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
      L (link order), O (extra OS processing required), G (group), T (TLS),
      C (compressed), x (unknown), o (OS specific), E (exclude),
      l (large), p (processor specific)
    $ 
    

    .text is your code, .shstrtab is the Section Header String table. Each ElfHeader structure contains an e_shstrndx member which is an index into the .shstrtab table. If you use this index, you can find the name of that section.