Search code examples
pythondlllinkerctypesstatic-linking

How does Python consume DLL files without needing the import file (.lib)?


I just finished a tutorial on building a DLL library. From the tutorial I learnt that a DLL also has an associated lib file, which the linker will use to statically linked in information to the client program. The lib file will contain information such as memory address locations of where to find functions inside the DLL.

My confusion comes in when using Python. With Python we seem to utilize pyd files, which are DLL formatted files with added information to make them callable into Python. In addition, I have seen code examples of using the ctypes library to call into DLL files, which allows it to happen without using the associated lib file. So I am confused on why we need the lib file when using the DLL library in the Microsoft tutorial, how this file does not seem to be used when calling into DLL libraries via Python by using either a pyd or the ctypes library.


Solution

  • A DLL import library (Microsoft extension .lib) is required only by the Micrsoft linker to link a program against a DLL. That means of course that it is only required at link time; it is not required at runtime. An import library has no runtime function at all; it is not a loadable file.

    Furthermore, an import library is only required by the Microsoft linker, for historical reasons. It is not technically necessary to use an import library to link a program against a DLL. The MYSY2/MinGW-w64 linker that is invoked by Windows GCC does not need import libraries: it can link directly against a DLL, although it can also use an import library if it finds one first.

    An import library name.lib serves the linker as a statically linkable proxy for name.dll, which itself cannot be statically linked. In the simplest terms, name.lib is an archive of little object files which between them convey to the linker information such as this:

    Here are some symbol names symbol1, symbol2,...,symbolN which may not be defined by any regular object files available to the linkage but are defined in a DLL name.dll.

    Usually symbol1, symbol2,...,symbolN are names of functions exported by name.dll. The linker extracts these little object files from the archive and links the information statically into the program. Then at runtime, the runtime linker will detect this information when it is asked to load the program; it will search its runtime library path for name.dll and - if successful - it will load name.dll into the address space of the program and resolve the program's references to symbol1, symbol2,...,symbolN to the definitions that are (hopefully) provided by name.dll. If that too is successful (for all DLLs that the program needs) the program is finally allowed to run. The role of the import library in the life of the program is finished once the static linker has created the program.

    This process of obtaining information about a DLL that a program depends on, and about the symbols that the program references that are defined in the DLL, and statically linking this information into the program as instructions to the runtime linker - that's what we mean by linking against a DLL.

    As noted already, is not necessary to use an import library to accomplish this process. Import libraries are the Microsoft way of doing it. In Linux and other OSes - Python runs on all of them - import libraries aren't used at all to link against dynamic libraries. Instead, the dynamic library itself is input to the static linker. The dynamic library cannot be statically linked, but the linker just examines it to see what undefined symbols in the program are defined by the dynamic library, and then statically links the the necessary instructions to the runtime linker into the program.

    Not only is an import library unncessary, it is also unnessary to statically link any information about name.dll into a program to enable the program to load name.dll and call functions in it. If a program believes that name.dll exists on the system and wants to reference a symbol it believes is defined in name.dll (usually, call a function defined in name.dll), it can itself request the OS to find and load name.dll, using the LoadLibrary system call, and request the runtime linker to give it the address of the symbol it wants to reference from name.dll, using the GetProcAddress system call.

    So, the process of statically linking instructions to the runtime linker into a program - whether it is done using an import library or not - is way in which you can avoid doing the runtime linkage of a DLL programmatically and have it all done instead by the OS when the program is run. And an import library is an optional way of doing that avoiding.

    Python, of course, is a runtime interpreter. It does not invoke the static linker at all. If you ask it to import your .pyd module, Python loads the DLL programmatically. If you use cytypes to call into a C or C++ DLL, Python either loads the DLL programmatically for you, or you can explicitly load it programmatically yourself by calling cdll.LoadLibrary(libname).

    Any DLL that Python needs to load, either for its own purposes or the user's purposes, is either a DLL that it loads programmatically, or it is a DLL that Python was linked against when Python was built, which will be automatically loaded with Python. In either case, as a user of Python you never need the import libary for that DLL, and this is true whether you invoke Python by a > python3 myscript.py shell command or invoke it embedded in a C or C++ program that has been linked with [lib]python3dll.