I am getting data from various site through url. Url parameters are url-encoded with the php urlencode()
function, but character encoding can be still be UTF-8 or Latin-1.
For example, the é character, when url-encoded from UTF-8 becomes %C3%A9 but when url-encoded from Latin-1, it becomes %E9.
When I get data through url, I use urldecode()
and then I need to know what is the character encoding so I eventually use utf8_encode
before I insert them in a MySQL database.
Strangely, the following code doesn't work :
$x1 = 'Cl%C3%A9ment';
$x2 = 'Cl%E9ment';
echo mb_detect_encoding(urldecode($x1)).' / '.mb_detect_encoding(urldecode($x2));
It returns UTF-8 / UTF-8
Why is that, what am I doing wrong and how can I know the character encoding of those string ?
Thanks
mb_detect_encoding()
is normally useless with the default second parameter:
<?php
$x1 = 'Cl%C3%A9ment';
$x2 = 'Cl%E9ment';
$encoding_list = array('utf-8', 'iso-8859-1');
var_dump(
mb_detect_encoding(urldecode($x1), $encoding_list),
mb_detect_encoding(urldecode($x2), $encoding_list)
);
... prints:
string(5) "UTF-8"
string(10) "ISO-8859-1"