Search code examples
phputf-8ansi

Language specific characters to regular English chars


I am not sure where to start with this, but here is what I want to do:

Users have a textfield where they need to input few words. Problem is that page will use people from different countries, and they will enter "weird" Latin characters like: ž, Ä, Ü, đ, Ť, Á etc.

Before saving to base I want to convert them to z, a, u, d, t, a... Is there a way to do this without making something like this (I think there is too much characters to cover):

 $string = str_replace(array('Č','Ä','Á','đ'), array('C','A','A','d'), $string);

And, yes, I know that I can save utf-8 in database, but problem is that this string will later be sent by SMS, and because of sms protocol nature, these "special" chars use more space in message than regular English alphabet characters (I am limited to 120 chars, and if i put "Ä" in message, it will take more than 1 character place).


Solution

  • First of all, I would still store the original characters in utf-8 in the database. You can always "translate" them to ASCII characters upon retrieval. This is good because if, say, in the future SMS adds UTF-8 support (or you want to use user data for something else), you'll have the original characters intact.

    That said, you can use iconv to do this:

    iconv('utf-8', 'ascii//TRANSLIT', $input);  //where $input contains "weird" characters
    

    See this thread for more info, including some caveats of this approach: PHP: Replace umlauts with closest 7-bit ASCII equivalent in an UTF-8 string