I did the following things:
htmlspecialchars($string, ENT_COMPAT, 'UTF-8')
where $string
is the string containing the special ü character.It gives me an error: Invalid multibyte sequence in argument. When I change 'UTF-8'
with 'ISO8859-1'
, no error is thrown, but the incorrect character is shown. (The 'unknown character' character, looks like <?>
)
If I use an HTML form to update the string in the database, the error disappears and the character is displayed correctly, however, when I then look at the record in Navicat, it looks two characters:
[1/4][A with some thing on top of it]
Some multibyte that isn't seen as one character.`
What is going on, where are things going wrong, and what can I do about it?
Although I don't understand where the "invalid multibyte" error comes from, I'm pretty sure htmlspecialchars()
is not your culprit:
For the purposes of this function, the charsets ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R are effectively equivalent, as the characters affected by htmlspecialchars() occupy the same positions in all of these charsets.
In my understanding, htmlspecialchars()
should work fine for a UTF-8 string without specifying a character set. My bet would be that either the HTML page containing the form, or the database connection you use is not UTF-8 encoded. For the latter, try sending a
SET NAMES utf8;
to mySQL before doing the insert.