Search code examples
cx86-64glibcportingmemcpy

Port glibc 2.25 and test memory functions


I was investigating whether a few memory functions(memcpy, memset, memmove) in glibc-2.25 with various versions(sse4, ssse3, avx2, avx512) could have performance gain for our server programs in Linux(glibc 2.12).
My first attempt was to download a tar ball of glibc-2.25 and build/test following the instructions here https://sourceware.org/glibc/wiki/Testing/Builds. I manually commented out kernel version check and everything went well. Then a test program was linked with newly built glibc with the procedure listed in section "Compile against glibc build tree" of glibc wiki and 'ldd test' shows that it indeed depended on the expected libraries:

    # $GLIBC is /data8/home/wentingli/temp/glibc/build
    libm.so.6 => /data8/home/wentingli/temp/glibc/build/math/libm.so.6 (0x00007fe42f364000)
    libc.so.6 => /data8/home/wentingli/temp/glibc/build/libc.so.6 (0x00007fe42efc4000)
    /data8/home/wentingli/temp/glibc/build/elf/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007fe42f787000)
    libdl.so.2 => /data8/home/wentingli/temp/glibc/build/dlfcn/libdl.so.2 (0x00007fe42edc0000)
    libpthread.so.0 => /data8/home/wentingli/temp/glibc/build/nptl/libpthread.so.0 (0x00007fe42eba2000)

I use gdb to verify which memset/memcpy was actually called but it always shows that __memset_sse2_unaligned_erms is used while I was expecting that some more advanced version of the function(avx2,avx512) could be in use. My questions are:

  1. Did glibc-2.25 select the most suitable version of memory functions automatically according to cpu/os/memory address? If not, am I missing any configuration during glibc build or something wrong with my setup?
  2. Is there any other alternatives for porting memory functions from newer glibc?

Any help or suggestion would be appreciated.


Solution

  • On x86, glibc will automatically select an implementation which is most suitable for the CPU of the system, usually based on guidance from Intel. (Whether this is the best choice for your scenario might not be clear because the performance trade-offs for many of the vector instructions are extremely complex.) Only if you explicitly disable IFUNCs in the toolchain, this will not happen, but __memset_sse2_unaligned_erms isn't the default implementation, so this does not apply here. The ERMS feature is pretty recent, so this is not completely unreasonable.

    Building a new glibc is probably the right approach to test these string functions. Theoretically, you could also use LD_PRELOAD to override the glibc-provided functions, but it is a bit cumbersome to build the string functions outside the glibc build system.

    If you want to run a program against a patched glibc without installing the latter, you need to use the testrun.sh script in the glibc build directory (or a similar approach).