c dll shared-libraries static-libraries dynamic-linking

Why aren't LIBs and DLLs interchangeable?

LIB files are static libraries that must be included at compile time, whereas DLL files can be "dynamically" accessed by a program during runtime. (DLLs must be linked, however, either implicitly before runtime with an import library (LIB) or explicitly via LoadLibrary).

My question is: why differentiate between these file types at all? Why can't LIB files be treated as DLLs and vice versa? It seems like an unnecessary distinction.

Some related questions:

DLL and LIB files - what and why?

Why are LIB files beasts of such a duplicitous nature?

DLL and LIB files

What exactly are DLL files, and how do they work?

Solution

You must differentiate between shareable objects and static libraries simply because they are really different objects.

A shareable object file, as a DLL or a SO, contains structures used by the loader to allow the dynamic link to and from other executable images (i.e. export tables).

A DLL is to all intents and purposes an executable image which, being an executable, can be loaded into memory and possibly repositioned (if its code is not position independent). It imports symbols from other DLLs as executables do, but also exports its symbols to allow access to its functions and data from other linked executables.

Exported symbols can be used by the loader to interlink different executable modules in memory.

A static library, on the other hand, is simply a collection of object modules that can selectively be linked to an executable program or even a DLL. Library files have an internal structure that allows to store, catalog and extract the object modules it contains.

The object modules contains, in turn, instruction bytecode and placeholder for external symbols which are referenced through the relocation table.

The linker starts collecting the object modules specified on its command line, one by one, examining their symbol tables. Each time a new reference, not present in the current object module, is discovered, i.e. a function or a data location, it is first searched in the specified object modules, then, if not found, in the libraries in the order they are in the command line (this is the reason for which the order of files on command line is important if a symbol is present in more libraries, the first encountered will be used). When the reference is found the code or data is added to the linking code stream, in the respective sections, and the references set at the offset of it relative to the binary stream. This displacement is called virtual address.

If the code is in an object file the process is straightforward, when it is in a library somewhat more complex.

The linker first search for the symbol in the headers of the library structure, if the reference is found then it computes the object module offset to extract it from library. The remaining processing is the same as for standard object files (library is an archive of object modules).

After code or data section from an object module is added to the output stream the placeholder in the code are replaced by the virtual address.

Again the symbol table of the freshly added code is scanned and process repeats to fix new references by assigning their virtual addresses in case the symbol is already defined (present in the actual stream) o searched again in the libraries.

This is a recursive process that will end when no more undefined references remain.

If a symbol cannot be found in any of the supplied modules and libraries the linker issues the undefined symbol error.

At the end of linking process you have a memory image of your executable or DLL code. This image will be read in memory by the loader, an OS component, that will fix some minor references, replace virtual addresses with real physical memory references and fills the import table with the physical addresses of symbols imported from external DLL's as they are mapped in the process.

Now it should be clear that you can extract each single object module you need from an archive (library file), but you can't do the same with a DLL because in it all modules are merged without any reference for the start and the end of each.

More simply an object module, the .obj file, or a collection of them, .lib file, is quite different from a DLL: Raw code the first, a fully linked and 'ready to run' piece of code the second.

The very reason for existence of shareable objects and static libraries is related to efficiency and resource rationalization.

When you statically link library modules you replicate same code for each executable you create using that static library, implying larger executable files that will take longer time to load wasting kernel execution time and memory space.

When you use shareable objects you load the code only the first time, then for all subsequent executables you need only to map the space where the DLL code lays in the new process memory space and create a new data segment (this must be unique for each process to avoid conflicts), effectively optimizing memory and system usage (for lighter loader workload).

So how we have to choose between the two?

Static linking is convenient when your code is used by a limited number of programs, in which case the effort to load a separate DLL module isn't worth.

Static linking also allows to easily reference process defined global variables or other process local data. This however is not possible, or not so easy to do, with a DLL because being a complete executable can't have undefined references so you must define any global inside the DLL, and this reference will be common for all processes accessing the DLL code.

Dynamic linking is convenient when the code is used by many programs making more efficient the loader work, and reducing the memory usage. Example of this are the system libraries, which are used by almost all programs, or compiler runtime.