Search code examples
phpinternet-explorerencodingutf-8tinymce

UTF-8 Encoding with internet explorer %u20AC to €


I'm currently using TinyMCE as html editor for users of my CMS. Somehow the euro symbol (€) is converted to %u20AC by IE (any).

After a short search I found this. It gives a lot for different encodings for the UTF-8 euro symbol, but not %u20AC, with the percentage icon.

I have given the proper headers for UTF-8, so I gues IE is just being rude doing things its own way...

Is there a PHP function that can catch this strange encoding and put it to normal htmlentity (hex,decimal or named). I could just string_replace() this single problem symbol, but I'd rather fix all possible conflicts at once.

Or should I simply replace %u with &#x disabling normal usage of %u?


Solution

  • %u20AC is Unicode-encoded data for which is generated by JavaScript escape() function MDN, ECMA262 to UTF8 for server-side processing.

    Standard PHP urldecode() can not deal with it (it is a non-standard percent encoding WP), so you need to use an extended routine:

    /**
     * @param string $string unicode and ulrencoded string
     * @return string decoded string
     */
    function utf8_urldecode($string) {
        $string = preg_replace(
            "/%u([0-9a-f]{3,4})/i",
            "&#x\\1;",
            urldecode($string)
        );
        return html_entity_decode($string, ENT_XML1, 'UTF-8');
    }
    

    Also check if you can configure this behaviour for your TinyMCE.


    References