Search code examples
phpstringinternationalizationasciilocale

Are string functions ASCII-safe in PHP?


Some PHP string functions (like strtoupper, etc) are locale dependent. But it is still not clear whether locale is important when I do really know that particular string is made of ASCII (0-127) characters only. Can I be guaranteed that strtoupper('abc..xyz') will always return ABC..XYZ independently of locale. Do PHP string functions work the same in ASCII range independently of locale?

While the answer about strtoupper is important to me, the question is more general about all string functions library.

I want to be sure that user selected locale (on a multi-language site) will not break my core functionality which has nothing to do with internationalization.


Solution

  • Do PHP string functions work the same in ASCII range independent from locale?

    No, I'm afraid not. The primary counterexample is the dreaded Turkish dotted-I:

    setlocale(LC_CTYPE, "tr_TR");
    echo strtoupper('hi!');
    
    -> 'H\xDD!' ('Hİ!' in ISO-8859-9)
    

    In the worst case you may have to provide your own locale-independent string handling. Calling setlocale to revert to C or some other locale is kind-of a fix, but the POSIX process-level locale model is a really bad fit for modern client/server apps.