Search code examples
c++cstringlocale

What is the difference between strcmp() and strcoll()?


I tried understanding both of them but I did not find any differences except for strcoll() this reference says that it

compares two null terminated strings according to current locale as defined by the LC_COLLATE category.

On the second thoughts and I know I am asking another question for detailed answer, what exactly is this locale, for both C and C++?


Solution

  • strcmp() takes the bytes of the string one by one and compare them as is whatever the bytes are.

    strcoll() takes the bytes, transform them using the locale, then compares the result. The transformation re-orders depending on the language. In French, accentuated letters come after the non-accentuated ones. So é is after e. However, é is before f. strcoll() gets it right. strcmp() not so well.

    However, in many cases strcmp() is enough because you don't need to show the result ordered in the language (locale) in use. For example, if you just need to quickly access a large number of data indexed by a string you'd use a map indexed by that string. It probably is totally useless to sort those using strcoll() which is generally very slow (in comparison to strcmp() at least.)

    For details about characters you may also want to check out the Unicode website.

    In regard to the locale, it's the language. By default it is set to "C" (more or less, no locale). Once you select a location the locale is set accordingly. You can also set the LC_LOCALE environment variable. There are actually many such variables. But in general you use predefined functions that automatically take those variables in account and do the right thing for you. (i.e. format dates / time, format numbers / measures, compute upper / lower case, etc.)