I created a simple hello world program in C like so:
#include <stdio.h>
int main() {
printf("Hello World!\n");
return 0;
}
Afterwards, I compiled it on Mac using gcc and dumped it using xxd. With 16 bytes per line (8 words), the compiled program was a total of 3073 lines or 49 424 bytes. Out of all these bytes, only 1 904 of them composed the program while the remaining 47 520 bytes were all zeros.
Considering that only approximately 3.9% of the bytes are not zeros, this is a clear example of a waste of space. Is there any way to optimize the size of the executable here? (By the way, I already tried using the -Os
compiler option and got no results.)
Edit: I got these numbers by counting lines of hexdump, but within the lines containing actual instructions there were also zeros. I didn't count these bytes as they may be crucial to the execution of the program. (Like the null terminator for the string Hello World!
) I only counted full blocks of zeros.
gcc on MacOS generates object and executable files in the Mach-O file format. The file is divided up into multiple segments, each of which has some alignment requirement to make loading more efficient (hence why you get all the zero padding). I took your code and built it on my Mac with gcc, gives me an executable size of 8432 bytes. Yes, xxd
gives me a bunch of zeros. Here's the objdump
output of the section headers:
$ objdump -section-headers hello
hello: file format Mach-O 64-bit x86-64
Sections:
Idx Name Size Address Type
0 __text 0000002a 0000000100000f50 TEXT
1 __stubs 00000006 0000000100000f7a TEXT
2 __stub_helper 0000001a 0000000100000f80 TEXT
3 __cstring 0000000f 0000000100000f9a DATA
4 __unwind_info 00000048 0000000100000fac DATA
5 __nl_symbol_ptr 00000010 0000000100001000 DATA
6 __la_symbol_ptr 00000008 0000000100001010 DATA
__text
contains the machine code of your program, __cstring
contains the literal "Hello World!\n"
, and there's a bunch of metadata associated with each section.
This kind of structure is obviously overkill for a simple program like yours, but simple programs like yours are not the norm. Object and executable file formats have to be able to support dynamic loading, symbol relocation, and other things that require complex structures. There's a minimum level of complexity (and thus size) for any compiled program.
So executable files for "small" programs are larger than you think they should be based on the source code, but realize there's a lot more than just your source code in there.