Search code examples
gccstatic-librariescpythonbzip2

How to build CPython with static libbz2


I've been trying to build CPython with static libs so as to port it into another machine. I've managed to make it work, it compiles fine and is accessible from the other machine. But the moment I try to execute a module that requires bz2 it fails because it cannot find the libbz2.so

I compile Python with:

./configure --prefix=/prj/build/python --enable-shared=no --disable-shared LDFLAGS="-L/usr/local/lib64 -L/usr/lib/x86_64-linux-gnu -l:libffi.a -l:libbz2.a" CFLAGS="-fPIC"
make && make install

As you can see, I've been trying to make the configure use the static lib already. I have installed the library with apt-get install libbz2-dev. It works well for libffi but not for libbz2. It goes straight for the .so instead.

I've read somewhere else people suggest using "-static" as flag. But that makes everything static which I don't want. I only need those two built-in modules static.

Thank you very much.


Solution

  • I believe that if you test more carefully you'll find that your attempt to statically link libffi and libbz2 is actually ineffectual in both cases.

    The python3 extension module that interfaces to libffi is cyctpes. The module that interfaces to libz2 is of course bz2. If you write a minimal python script that calls into cyctpes and another that calls into bz2, then run each script with the python3 interpreter you've built in this way:

    $ LD_DEBUG=libs /prj/build/python/bin/python3 <testscript>.py
    

    then setting LD_DEBUG=libs in the shell will prompt the runtime linker to display information about each shared library that is loaded to run the program. You'll see that the bz2 module tester dynamically loads libbz2.so and also that the ctypes tester dynamically loads libffi, which you aim to avoid.

    Your ./configure... command misses the mark

    The setting LD_FLAGS= ... -l:libffi.a -l:libbz2.a that you passed to ./configure fails to link against those libraries in the builds of the ctypes and bz2 modules and links with the default shared libraries instead. If we run your build and capture the build log, we can see that this is so and why.

    For the ctypes module, the link step output is:

    gcc -shared -L/usr/local/lib64 -L/usr/lib/x86_64-linux-gnu -l:libffi.a -l:libbz2.a  \
       Modules/_ctypes/_ctypes.o \
       Modules/_ctypes/callbacks.o Modules/_ctypes/callproc.o \
       Modules/_ctypes/stgdict.o \
       Modules/_ctypes/cfield.o \
       -lffi  -ldl  -o Modules/_ctypes.cpython-313-x86_64-linux-gnu.so
       
    

    For the bz2 module, the link step output is:

    gcc -shared -L/usr/local/lib64 -L/usr/lib/x86_64-linux-gnu -l:libffi.a -l:libbz2.a \        
        Modules/_bz2module.o \
        -lbz2  -o Modules/_bz2.cpython-313-x86_64-linux-gnu.so
        
    

    Notice first that in both cases the LDFLAGS value you have input to ./configure, i.e.

    -L/usr/local/lib64 -L/usr/lib/x86_64-linux-gnu -l:libffi.a -l:libbz2.a
    

    precedes all the object files to be linked.

    Notice also that the default libraries for the these linkages - the ones that you are trying to knock with -l:libffi.a -l:libbz2.a - are still in the linkages, but they come after all the object files. In the bz2 linkage, -lbz2 is there, after the object files. And in the ctypes linkage -lffi is there, after the object files.

    That's probably what you expected, expecting the default shared libaries to be knocked out by the static ones that you have placed before them. But remember how the linker resolves symbols?

    • It scans the sequence of input files - explicitly named object files, explicitly named libraries and -l-libraries discovered by search - just once from left to right.

    • It unconditionally links any object file but it searches a library only to find definitions of undefined symbol references that have accrued earlier in the linkage.

    • When a static library member, e.g. libffi.a(objfile.o), is found to provide a needed definition then the object file objfile.o is copied out of the archive and statically linked into the output file.

    • When a shared library, e.g. libffi.so is found to provide a needed definition then the linker just drops a note into the output file that will prompt the dynamic linker at runtime to load the necessary shared library and try to resolve remaining undefined symbols against it - that remainder being the symbols that could not be statically defined from the object files linked at buildtime.

    When the cytypes and bz2 modules are linked as above, -l:libffi.a -l:libbz2.a specify the very first two files that the linker encounters. They are both libraries. Since nothing at all has so far been linked into the output file, there are 0 undefined symbol references to resolve, and therefore nothing to search for in libraries. libffi.a and libbz2.a contribute nothing to the linkage at this point and are never considered again.

    After libffi.a and libbz2.a are ignored, the linker finds the object files and links them all into the output file. That brings undefined symbol references into the linkage and obliges the linker to seach subsequent libraries for definitions. Next come the default -l options that you wanted to knock out, -lffi for the ctypes module and -lbz2 for the bz2 module. We are not doing a -static linkage, so the linker will for preference resolve these -l options to the shared libraries libffi.so and libbz2.so. Which it duly does, and finds needed definitions in them. So they are added to the dynamic dependencies of the respective modules, just as they would by default.

    Why your ./configure command makes that mistake

    Your ./configure command results in -l:libffi.a -l:libbz2.a being interpolated into linkage commands in a position where it has no effect. This stems from a misunderstanding of the autotools linkage environment variables LD_FLAGS and LIBS. Their proper roles are clarified by running ./configure (-h|--help) in the build directory (of any autotooled package) and looking for those variables in the section Some influential environment variables:

    Some influential environment variables
      ...[cut]...   
      LDFLAGS     linker flags, e.g. -L<lib dir> if you have libraries in a
                  nonstandard directory <lib dir>
      LIBS        libraries to pass to the linker, e.g. -l<library>
      ...[cut]...
      
    

    $LD_FLAGS is interpolated into linkage commands before the object files are consumed, $LIBS after them. This distinction lets you avoid interpolating -l-options uselessly before the object files.

    The obvious fix - and why it is a loser

    It seems obvious that your ./configure command should define:

    LDFLAGS="-L/usr/local/lib64 -L/usr/lib/x86_64-linux-gnu" LIBS="-l:libffi.a -l:libbz2.a"
    

    to put -l:libffi.a and -l:libbz2.a in a linkage position where they matter.

    But this is ineffective in a different way. If you build like that and save the buildlog, you'll now see that whereas previously -l:libffi.a -l:libbz2.a appeared uselessly in the linkage of every module, now they will appear in the linkage of none.

    Once again, ./configure -h explains.

    Some influential environment variables:
    ...[cut]...
      LIBFFI_CFLAGS
                  C compiler flags for LIBFFI, overriding pkg-config
      LIBFFI_LIBS linker flags for LIBFFI, overriding pkg-config
    ...[cut]...
      BZIP2_CFLAGS
                  C compiler flags for BZIP2, overriding pkg-config
      BZIP2_LIBS  linker flags for BZIP2, overriding pkg-config
    ...[cut]...
    

    All of the extension module builds depend on upstream libraries whose package maintainers define the compilation and linkage options that downstream builds require. These options are expressed in the library.pc file that comes with the library's development package and is installed in the like of usr/lib/x86_64-linux-gnu/pkgconfig/. There, the pkg-config tool can retrieve and query them with commands like1:

    $ pkg-config --cflags libffi
    # ...Nothing
    $ pkg-config --libs libffi
    -lffi
    

    The ./configure -h output shows that the python3 build will source the LIBS value for linking with any of the upstream extension module libraries LIBNAME from the expansion of $(LIBNAME_LIBS), if LIBNAME_LIBS is defined, and by default source it from the expansion of $(pkg-config --libs LIBNAME) (Since python3 does not build these upstream libraries, LIBNAME_CFLAGS is irrelevant.)

    This is just as it should be. But it means that the ./configure-ed value of LIBS is unused in the extension module builds.

    The correct ./configure ... command - and why even that is insufficent

    To replace the LIBS values sourced from pkg-config in the module builds for ctypes and bz2, your ./configure command should pass:

    LIBFFI_LIBS=-l:libffi.a BZIP2_LIBS=-l:libbz2.a
    

    If you build on that basis, you'll get conclusive proof that libffi.a and libbz2.a have been used - because when either of these libraries is used in a -shared linkage the build will fail with a relocation linkage error, like:

    gcc -shared -L/usr/local/lib64 -L/usr/lib/x86_64-linux-gnu     Modules/_bz2module.o -l:libbz2.a  -o Modules/_bz2.cpython-313-x86_64-linux-gnu.so
    /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libbz2.a(bzlib.o): warning: relocation against `stdout@@GLIBC_2.2.5' in read-only section `.text'
    /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libbz2.a(bzlib.o): relocation R_X86_64_PC32 against symbol `BZ2_crc32Table' can not be used when making a shared object; recompile with -fPIC
    /usr/bin/ld: final link failed: bad value
    collect2: error: ld returned 1 exit status
    

    That happens because the packaged static libraries contain object files, like libbz2.a(bzlib.o), that have not been compiled as position independent code (-fPIC). They can't be linked into a shared library, which must be PIC; they're not intended for -shared linkage.

    Passing CFLAGS=-fPIC in you own ./configure command can't fix this, because -fPIC is a compilation option, and you are not compiling the non-PIC object files in your static libraries, you're just trying to link them -shared. The relocation error diagnostic:

    ...libbz2.a(bzlib.o): relocation R_X86_64_PC32 against symbol `BZ2_crc32Table' \
    can not be used when making a shared object; recompile with -fPIC
    

    is telling you that the object file libbz2.a(bzlib.o) needs to be recompiled -fPIC. But you haven't got the source code.

    What else you need to do for success

    So before you fix the ./configure command for the python3 build, you first need to get the source packages for bzip2 and libffi and build local static libraries libbz2.a and libff1.a specifying -fPIC in the CFLAGS.

    It looks as if /usr/local/lib64 is where you want local libraries, so you could install your -fPIC static libraries there. Maybe rename them libbz2-PIC.a and libff1-PIC.a for clarity and amend your python3 ./configure commandline accordingly.

    You'll then be able to build python3 successfully and the LD_DEBUG=libs test will show you that the bz2 module does not load libbz2.so and the the ctypes module does not load libffi.so


    1. It's a long-standing bug in Debian-derived distros and maybe others that the libbz2-dev source package lacks a pkg-config .pc file. In this case the python3 build just goes ahead with -lbz2 to link the libary.