Search code examples
linuxmipssemaphoreldglibc

MIPSEL GLIBC sem_init() not shared


I'm currently developing an application that is meant to run on Linux for MIPSEL platform. The application uses POSIX semaphores inside one of its DSO, and the attention in my case is for the sem_init() function used inside one DSO on which the app depends.

The application is currently using the GLIBC_2.31, but I observed the same phenomenon also using the GLIBC_2_26.

Now I'm going to explain the anomaly and the current status of the troubleshooting.

All happened when I noticed a process waiting on a semaphore wouldn't wake when the semaphore incremented its value.

Just for the records, I'm aware that for multiprocessing, the semaphore must sit in a memory area that is shared between the processes that use the semaphore.

Now, investigating further using strace, I also noticed that the semaphore I initialized using sem_init(&semaphore, 1, 0) resulted in being private, despite I initialized it to be shared.

futex(0x774ce0bc, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {tv_sec=1615204364, tv_nsec=515578961}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)

The next step I performed on this issue has been to debug using gdb the semaphore initialization process. There I discovered that in this particular combination, MIPSEL - LINUX - GLIBC, the sem_init() exists in two versions:

$ mipsel-linux-objdump -T libpthread.so.0  | grep sem_init
00012e30 g    DF .text  00000040 (GLIBC_2.0)  sem_init
00012dd0 g    DF .text  00000060  GLIBC_2.2   sem_init

For some reason, the linker decided to link my code against the older (not supporting shared semaphores) instead of the newer.

Looking at the GLIBC code, particularly where the sem_init() is implemented (/nptl/sem_init.c), I realized that sem_init() is an alias and that two symbols identify the functions' implementation univocally.

My next move then has been to call directly the function I needed __new_sem_init(), just to be sure any linker alchemy I'm not aware of would interfere with my intentions.

Unfortunately, the symbol is not exported, and it can't be used.

Looking at my library liba.so which uses sem_wait(), sem_init(), and sem_post() I noticed that these symbols are all exported by libpthread.so.0.

$ mipsel-linux-nm -D libpthread.so.0 | grep sem_
00014a10 T sem_clockwait
00013784 T sem_close
00012e70 T sem_destroy
00012e70 T sem_destroy
00013b14 T sem_getvalue
00013b00 T sem_getvalue
00012e30 T sem_init
00012dd0 T sem_init
000131b8 T sem_open
00014ab0 T sem_post
00014b90 T sem_post
00014508 T sem_timedwait
00013f40 T sem_trywait
00013fac T sem_trywait
00013920 T sem_unlink
00013eb0 T sem_wait
00014020 T sem_wait

But looking at the liba.so dependencies, I've been surprised by the fact that the library depends only on the main libc library (libc.so.6) and has no dependency on libpthread.so.0 where the actual sem_* symbols are implemented.

$ /lib/ld-2.31.so --list /usr/lib/liba.so 
        linux-vdso.so.1 (0x7ffaa000)
        libc.so.6 => /lib/libc.so.6 (0x77ddc000)
        /lib/ld-2.31.so (0x77f8a000)

I guess this might be strongly related to my problem, but I have no clue how to link these two facts. How the liba.so could be compiled and linked using symbols its dependencies do not provide?

$ mipsel-linux-objdump -T libc.so.6  | grep sem
000fab80 g    DF .text  00000070  GLIBC_2.0   semget
000fabf0 g    DF .text  000000b0  GLIBC_2.2   semctl
000faca0  w   DF .text  00000074  GLIBC_2.3.3 semtimedop
000359d0 g    DF .text  00000074  GLIBC_2.0   sigisemptyset
00149854 g    DF .text  000000b0 (GLIBC_2.0)  semctl
000fab60 g    DF .text  00000018  GLIBC_2.0   semop
$

How to force the GNU linker to link against sem_init@@GLIBC_2.2 instead of sem_init@GLIBC_2.0?


Solution

  • It seems like my problem was indeed an underlinking issue.

    After proper research on the literature and a considerable amount of tests, I've finally been able to resolve my original problem. Make my application use the correct version of the sem_init().

    My assumption that because my library did not carry the libpthread dependency was on the right path to follow.

    Checking about my Makefile, I saw the -libpthread switch was missing in the statement, creating the shared library.

    That was the root of the problem, and once I added the switch, everything started working as expected.

    One of the things that delayed reaching the solution was building the same library with the same Makefile on a different target; the libpthread dependency was included in the library.

    But I do not have any explanation about why the dynamic linker has picked the oldest symbol sem_init@GLIBC_2_0 in place of sem_init@GLIBC_2_2.

    I understand the linker has been able to link my sem_init() because the original executable app has libpthread as a dependency, so at link time, the library was there, and it could be able to find the symbol.

    What I can not explain is why it picked the sem_init@GLIBC_2_0

    According to these sources

    http://peeterjoot.com/2019/09/20/an-example-of-linux-glibc-symbol-versioning/ The @@ one means that it applies to new code, whereas the @MYSTUFF_1.1 is a load only function, and no new code can use that symbol.

    https://developers.redhat.com/blog/2019/08/01/how-the-gnu-c-library-handles-backward-compatibility/ The @@ tells the dynamic linker that this version is the default version.

    https://web.archive.org/web/20100430151127/http://www.trevorpounds.com/blog/?33 The double @@ can only be defined once for a given symbol since it denotes the default version to use.

    If a symbol is unversioned, the symbol that carries the double @@ should be picked by the dynamic linker.

    Is this statement true only if the symbol is unversioned and the object where the symbol is required properly contains a dependency on the DSO containing that symbol?