Search code examples
linuxcompilationkernelarchiveglibc

Use of archive files for compilation: why, and is there any alternative?


In the middle of compilation, Linux kernel creates liba.a that contains many built-in.o and other object files from different directories, and use it as a major component of the final vmlinux linking. I have seen similar use of archive files in glibc compilation, and am now wondering why those projects use archive files and what would be the benefit for it.

As far as I know, archive files generated with ar are simply containers for individual files included in them. I do not see much benefit of using it other than reducing file search time for each of object files. Is this the reasoning behind the use of archive files in the middle of compilation?

If so, I would be surprised that file name search takes that significant to make kernel people care about, and I wonder how much the cost of not using archive files is, and if there is any alternative for the similar problem without spatial inefficiency of .a files.


Solution

  • Re: I do not see much benefit of using it other than reducing file search time for each of object files.

    Your understanding of the benefit is not quite right. Archives reduce the workload of resolving individual symbols.

    If you link a program out of many individual .o files, the linker has to consider them all at the same time. The references can go in any direction. The very last .o on the command line can call a function in the very first .o and vice versa.

    This is not the case (by default, at least) with archives. With archives, functions in the earlier archives can only make references to symbols whose definitions appear in the later archives. (This is also related to the the traditional Unix convention why the -l linker options go at the end of the command line!!! Your .o files first, then the command line.)

    This means that once an archive appears which defines a symbol, you can be sure that the later archives do not use that symbol any more. Which means that you can remove it from your data structures. You are basically "done" linking that particular library; it has satisfied the prior references, and all that remains is to satisfy ITS unresolved references. If you order the linking process right, and the software is nicely layered, you can minimize how many symbols are outstanding at any time.

    Linux is more than 20 years old now and its build system has a long and rich history, just like the code. Archives were not used originally; I think that started in 2.6 only. Also, dependencies were once generated by a GNU awk script. People built the kernel on 25 Mhz 386 boxes with 4 megs of RAM, haha.

    Archives are used today because there was a need for them with the kernel getting larger. It's not just for the heck of it!