Search code examples
intelglibcdisassembly

strcmp and strcmp_sse functions in libc


I've seen that in libc.so the actual type of strcmp_sse to call is decided by the function strcmp itself.

Here it is the code:

      strcmp:
 .text:000000000007B9F0                 cmp     cs:__cpu_features.kind, 0
 .text:000000000007B9F7                 jnz     short loc_7B9FE
 .text:000000000007B9F9                 call    __init_cpu_features
 .text:000000000007B9FE
 .text:000000000007B9FE loc_7B9FE:                              ; CODE XREF: .text:000000000007B9F7j
 .text:000000000007B9FE                 lea     rax, __strcmp_sse2_unaligned
 .text:000000000007BA05                 test    cs:__cpu_features.cpuid._eax, 10h
 .text:000000000007BA0F                 jnz     short locret_7BA2B
 .text:000000000007BA11                 lea     rax, __strcmp_ssse3
 .text:000000000007BA18                 test    cs:__cpu_features.cpuid._ecx, 200h
 .text:000000000007BA22                 jnz     short locret_7BA2B
 .text:000000000007BA24                 lea     rax, __strcmp_sse2
 .text:000000000007BA2B
 .text:000000000007BA2B locret_7BA2B:                           ; CODE XREF: .text:000000000007BA0Fj
 .text:000000000007BA2B                                         ; .text:000000000007BA22j
 .text:000000000007BA2B                 retn

What I do not understand is that the address of the strcmp_sse function to call is placed in rax and never actually called. Therefore I am wondering: who is going do call *rax? When?


Solution

  • Linux dynamic linker supports a special symbol type called STT_GNU_IFUNC. Strcmp is likely implemented as an IFUNC. 'Regular' symbols in a dynamic library are nothing more but a mapping from a name to the address. IFUNCs are a bit more complex than that: the address isn't readily available, in order to obtain it the linker must execute a piece of code from the library itself. We are seeing an example of such a peice of code here. Note that in x86_64 ABI a function returns the result in RAX.

    This technique is typically used to pick the optimal implementation based on the CPU features. Please note that the selection logic runs only once; all but the first call to strcmp are fast.