Search code examples
phpmysqlcharacter-encodinghtmlspecialcharsnavicat

PHP htmlspecialchars() function error when trying to use UTF-8 string


I did the following things:

  1. I have a spreadsheet with data. One of the rows has a ü character in it.
  2. I save this as a CSV file in OpenOffice.org. When it asks me for a character encoding, I choose UTF-8.
  3. I use Navicat to create a MySQL database table, InnoDB with UTF-8 utf8_general encoding and import the CSV.
  4. I try to use PHP function htmlspecialchars($string, ENT_COMPAT, 'UTF-8') where $string is the string containing the special ü character.

It gives me an error: Invalid multibyte sequence in argument. When I change 'UTF-8' with 'ISO8859-1', no error is thrown, but the incorrect character is shown. (The 'unknown character' character, looks like <?>)

If I use an HTML form to update the string in the database, the error disappears and the character is displayed correctly, however, when I then look at the record in Navicat, it looks two characters:

[1/4][A with some thing on top of it]

Some multibyte that isn't seen as one character.`

What is going on, where are things going wrong, and what can I do about it?


Solution

  • Although I don't understand where the "invalid multibyte" error comes from, I'm pretty sure htmlspecialchars() is not your culprit:

    For the purposes of this function, the charsets ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R are effectively equivalent, as the characters affected by htmlspecialchars() occupy the same positions in all of these charsets.

    In my understanding, htmlspecialchars() should work fine for a UTF-8 string without specifying a character set. My bet would be that either the HTML page containing the form, or the database connection you use is not UTF-8 encoded. For the latter, try sending a

    SET NAMES utf8;
    

    to mySQL before doing the insert.