Search code examples
phpcharacter-encodingurl-encoding

How can I know if url-encoded string is UTF-8 or Latin-1 with PHP?


I am getting data from various site through url. Url parameters are url-encoded with the php urlencode() function, but character encoding can be still be UTF-8 or Latin-1.

For example, the é character, when url-encoded from UTF-8 becomes %C3%A9 but when url-encoded from Latin-1, it becomes %E9.

When I get data through url, I use urldecode() and then I need to know what is the character encoding so I eventually use utf8_encode before I insert them in a MySQL database.

Strangely, the following code doesn't work :

$x1 = 'Cl%C3%A9ment';
$x2 = 'Cl%E9ment';

echo mb_detect_encoding(urldecode($x1)).' / '.mb_detect_encoding(urldecode($x2));

It returns UTF-8 / UTF-8

Why is that, what am I doing wrong and how can I know the character encoding of those string ?

Thanks


Solution

  • mb_detect_encoding() is normally useless with the default second parameter:

    <?php
    
    $x1 = 'Cl%C3%A9ment';
    $x2 = 'Cl%E9ment';
    
    $encoding_list = array('utf-8', 'iso-8859-1');
    
    var_dump(
        mb_detect_encoding(urldecode($x1), $encoding_list),
        mb_detect_encoding(urldecode($x2), $encoding_list)
    );
    

    ... prints:

    string(5) "UTF-8"
    string(10) "ISO-8859-1"