Search code examples
cstringlocale

Do strcmp and strstr test binary equivalence?


https://learn.microsoft.com/en-us/windows/win32/intl/security-considerations--international-features

This webpage makes me wonder. Apparently some windows api may consider two strings equal when they are actually different byte sequences. I want to know how C standard library behaves in this respect.

in other words, does strcmp(a,b)==0 imply strlen(a)==strlen(b)&&memcmp(a,b,strlen(a))==0? and what about other string functions, including wide character versions?

edit:

for example, CompareStringW equates L"\x00C5" and L"\x212B" printf("%d\n",CompareStringW(LOCALE_INVARIANT,0,L"\x00C5",-1,L"\x212B",-1)==CSTR_EQUAL); outputs 1

what I'm asking is whether C library functions never behave like this


Solution

  • The regular string functions operate byte-by-byte. The specification says:

    The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.

    strcmp() and memcmp() do the same comparisons. The only difference is that strcmp() uses the null terminators in the strings as the limit, memcmp() uses a parameter for this, and strncmp() takes a limit parameter and uses whichever comes first.

    The wide string function specification says:

    Unless explicitly stated otherwise, the functions described in this subclause order two wide characters the same way as two integers of the underlying integer type designated by wchar_t.

    wcscmp() doesn't say otherwise, so it's also comparing the wide characters numerically, not by converting their encodings to some common character representations. wcscmp() is to wmemcmp() as strcmp() is to memcmp().

    On the other hand, wcscoll() compares the strings as interpreted according to the LC_COLLATE category of the current locale. So this may not be equivalent to memcmp().

    For other functions you should check the documentation to see whether they reference the locale.