Search code examples
mysqlutf-8load-data-infilems-access-2010

UTF-8 characters display differently after import to mySQL


I have a mySQL database full of accented characters. The DB is populated periodically from MS Access 2010.

In Access you'll see é è à Ü. In the export process in Access UTF-8 encoding is specified.

Open the resulting text file in UltraEdit on my PC and you'll see "Vieux Carré" and UE says the encoding is U8-DOS.

The files are uploaded via FTP and imported via LOAD DATA LOCAL INFILE queries like this

LOAD DATA LOCAL INFILE '$dir/$t.$ext' INTO TABLE `$t` FIELDS  OPTIONALLY ENCLOSED BY '|' TERMINATED BY ';' LINES TERMINATED BY '\n'

In mySQL the field collation is set to utf8_general_ci.

If you query mySQL from the command line or from phpMyAdmin you'll see "Vieux Carré".

What am I doing wrong?


Solution

  • If you're using LOAD DATA INFILE with a file that has a charset that's different from your database's default, you need to specify what character set the file is in:

    LOAD DATA LOCAL INFILE '$dir/$t.$ext'
    INTO TABLE `$t`
    CHARACTER SET utf8
    FIELDS OPTIONALLY ENCLOSED BY '|' TERMINATED BY ';'
    LINES TERMINATED BY '\n'
    

    Note that the database character set is database-wide, and is different than the table character set and the column character set. SHOW CREATE DATABASE database_name will show you the database charset.