I'm investigating the size of an extremely small C program on Linux (ubuntu 20.04).
I'm compiling as follows:
gcc -s -nostdlib test.c -o test
the following progam:
__attribute__((naked))
void _start() {
asm("movl $1,%eax;"
"xorl %ebx,%ebx;"
"int $0x80");
}
Basically the idea is to make the Linux system call to exit rather than depending on the C runtime to do that for us. (which would be the case in void main() { }
). The program moves 1 into register EAX, clears register EBX (which would otherwise contain the return value), and then executes the linux system call interrupt 0x80. This interrupt triggers the kernel to process our call.
I would expect this program to be extremely small (less than 1K), however ..
du -h test
# >> 16K
ldd test
# >> statically linked
Why is this program still 16K?
du
reports the disk space used by a file whereas ls
reports the actual size of a file. Typically the size reported by du
is significantly larger for small files.
You can significantly reduce the size of the binary by changing compile and linking options and stripping out unnecessary sections.
$ cat test.c
void _start() {
asm("movl $1,%eax;"
"xorl %ebx,%ebx;"
"int $0x80");
}
$ gcc -s -nostdlib test.c -o test
$ ./test
$ ls -l test
-rwxrwxr-x 1 fpm fpm 8840 Dec 9 04:09 test
$ readelf -W --section-headers test
There are 7 section headers, starting at offset 0x20c8:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .note.gnu.build-id NOTE 0000000000400190 000190 000024 00 A 0 0 4
[ 2] .text PROGBITS 0000000000401000 001000 000010 00 AX 0 0 1
[ 3] .eh_frame_hdr PROGBITS 0000000000402000 002000 000014 00 A 0 0 4
[ 4] .eh_frame PROGBITS 0000000000402018 002018 000038 00 A 0 0 8
[ 5] .comment PROGBITS 0000000000000000 002050 00002e 01 MS 0 0 1
[ 6] .shstrtab STRTAB 0000000000000000 00207e 000045 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
$
$ gcc -s -nostdlib -Wl,--nmagic test.c -o test
$ ls -l test
-rwxrwxr-x 1 fpm fpm 984 Dec 9 16:55 test
$ strip -R .comment -R .note.gnu.build-id test
$ strip -R .eh_frame_hdr -R .eh_frame test
$ ls -l test
-rwxrwxr-x 1 fpm fpm 520 Dec 9 17:03 test
$
Note that clang
can produce a significantly smaller binary than gcc
by default in this particular instance. However, after compiling with clang
and stripping unnecessary sections, the final size of the binary is 736 bytes, which is bigger than the 520 bytes possible with gcc -s -nostdlib -Wl,--nmagic test.c -o test
.
$ clang -static -nostdlib -flto -fuse-ld=lld -o test test.c
$ ls -l test
-rwxrwxr-x 1 fpm fpm 1344 Dec 9 04:15 test
$
$ readelf -W --section-headers test
There are 9 section headers, starting at offset 0x300:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .note.gnu.build-id NOTE 0000000000200190 000190 000018 00 A 0 0 4
[ 2] .eh_frame_hdr PROGBITS 00000000002001a8 0001a8 000014 00 A 0 0 4
[ 3] .eh_frame PROGBITS 00000000002001c0 0001c0 00003c 00 A 0 0 8
[ 4] .text PROGBITS 0000000000201200 000200 00000f 00 AX 0 0 16
[ 5] .comment PROGBITS 0000000000000000 00020f 000040 01 MS 0 0 1
[ 6] .symtab SYMTAB 0000000000000000 000250 000048 18 8 2 8
[ 7] .shstrtab STRTAB 0000000000000000 000298 000055 00 0 0 1
[ 8] .strtab STRTAB 0000000000000000 0002ed 000012 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
$
$ strip -R .eh_frame_hdr -R .eh_frame test
$ strip -R .comment -R .note.gnu.build-id test
strip: test: warning: empty loadable segment detected at vaddr=0x200000, is this intentional?
$ ls -l test
-rwxrwxr-x 1 fpm fpm 736 Dec 9 04:19 test
$ readelf -W --section-headers test
There are 3 section headers, starting at offset 0x220:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 0000000000201200 000200 00000f 00 AX 0 0 16
[ 2] .shstrtab STRTAB 0000000000000000 00020f 000011 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
$
.text
is your code, .shstrtab
is the Section Header String table. Each ElfHeader
structure contains an e_shstrndx
member which is an index into the .shstrtab
table. If you use this index, you can find the name of that section.