glibc now uses SSE 4.2 to optimize strncmp
:
This can be seen in a debugger:
0xf7f20218 <__strncmp_sse4_2+40> movdqu xmm2, xmmword ptr [edx]
0xf7f2021c <__strncmp_sse4_2+44> mov ecx, eax
► 0xf7f2021e <__strncmp_sse4_2+46> and ecx, 0xfff
0xf7f20224 <__strncmp_sse4_2+52> cmp ecx, 0xff0
0xf7f2022a <__strncmp_sse4_2+58> ja __strncmp_sse4_2+125 <__strncmp_sse4_2+125>
I'm not steeped in SSE 4.2 for strings, but my understanding is that it allows it to compare up to 16 bytes at a time. The movdqu xmm2, xmmword ptr [edx]
loads 16 bytes from one of the strings.
My question is: If a short string (say 3 bytes) is at the end of a page, with the NULL termination within the page limits, but some of the remaining 13 bytes outside of the page, couldn't that cause a segfault, since we're now trying to load beyond the page we have access to?
This question came up in working on an emulator, which trapped an unconstrained access (that is, a read of memory which my application never wrote to):
strncmp(0x8064dd8, 0x7ffeff48, 0x4)
WARNING Filling memory at 0x7ffeff60 with 4 unconstrained bytes referenced from 0x818ba90 (strncmp+0x0 in libc.so.6 (0x8ba90))
This is perplexing, because:
That is, it doesn't seem to be a bug in the caller, but rather unexpected behavior by strncmp. Debugging strncmp led me to the SSE 4.2, which explains partially why it's reading beyond the limit set by n
: it simply uses SSE 4.2 to load many bytes at once, even if it doesn't need them at all.
Questions:
strncmp_sse4_2
read more than n
bytes?Is this correct? Does strncmp_sse4_2 read more than n bytes?
Yes.
Even if it does: Doing 16 bytes at a time should stop at 0x7ffeff58. Why does it read till 0x7ffeff60?
You are assuming that it started using movdqu
from the address you passed in. It likely didn't. It probably aligned the pointers to cache line first.
If so, how does this not potentially cause a page fault?
If you have a 16-byte aligned pointer p
, that means p+15
points to the same page as p
so you can read 16 bytes from p
with impunity.
If so, how do we tell distinguish acceptable read of uninitialized data from cases indicating bugs? E.g. how would Valgrind avoid reporting this as an uninitialized read?
Valgrind does this by interposing its own copy of strcmp
(for dynamically linked binaries). Without such interposition, valgrind
produces false positives (or, rather valgrind
produces true positives which nobody cares or could do anything about).