I've seen that in libc.so the actual type of strcmp_sse to call is decided by the function strcmp itself.
Here it is the code:
strcmp:
.text:000000000007B9F0 cmp cs:__cpu_features.kind, 0
.text:000000000007B9F7 jnz short loc_7B9FE
.text:000000000007B9F9 call __init_cpu_features
.text:000000000007B9FE
.text:000000000007B9FE loc_7B9FE: ; CODE XREF: .text:000000000007B9F7j
.text:000000000007B9FE lea rax, __strcmp_sse2_unaligned
.text:000000000007BA05 test cs:__cpu_features.cpuid._eax, 10h
.text:000000000007BA0F jnz short locret_7BA2B
.text:000000000007BA11 lea rax, __strcmp_ssse3
.text:000000000007BA18 test cs:__cpu_features.cpuid._ecx, 200h
.text:000000000007BA22 jnz short locret_7BA2B
.text:000000000007BA24 lea rax, __strcmp_sse2
.text:000000000007BA2B
.text:000000000007BA2B locret_7BA2B: ; CODE XREF: .text:000000000007BA0Fj
.text:000000000007BA2B ; .text:000000000007BA22j
.text:000000000007BA2B retn
What I do not understand is that the address of the strcmp_sse function to call is placed in rax and never actually called. Therefore I am wondering: who is going do call *rax? When?
Linux dynamic linker supports a special symbol type called STT_GNU_IFUNC. Strcmp is likely implemented as an IFUNC. 'Regular' symbols in a dynamic library are nothing more but a mapping from a name to the address. IFUNCs are a bit more complex than that: the address isn't readily available, in order to obtain it the linker must execute a piece of code from the library itself. We are seeing an example of such a peice of code here. Note that in x86_64 ABI a function returns the result in RAX.
This technique is typically used to pick the optimal implementation based on the CPU features. Please note that the selection logic runs only once; all but the first call to strcmp are fast.