Search code examples
mysqlcsvutf-8character-encodingload-data-infile

MySQL treating eszett character as "ß" during LOAD DATA operation


I am trying to import a small data set of Berlin street addresses using MySQL's LOAD DATA statement. The problem is that after the import runs, all of the beautiful ß characters in the German street names have become ß sets.

Here's the create-table statement I used for this table:

CREATE TABLE `subway_distances` (
  `STN` varchar(255) DEFAULT NULL,
  `HNR` int(9) DEFAULT NULL,
  `Lat` decimal(36,15) DEFAULT NULL,
  `Lon` decimal(36,15) DEFAULT NULL,
  `Distance` decimal(45,20) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

... and here is my MySQL shell code:

charset utf8;
TRUNCATE TABLE subway_distances;
LOAD DATA LOCAL INFILE '/path/to/output.csv'
    INTO TABLE berlin.subway_distances
    FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\';
SELECT * FROM subway_distances LIMIT 0,10;

I have looked at output.csv in vim, and the eszett character appears to be fine there.

I am assuming that I simply need a different encoding declaration in MySQL, but I'm not sure where to start.

I am also assuming that collation doesn't matter yet, since I'm not comparing values -- just purely trying to get a valid import.


Solution

  • I found an answer to this relatively quickly. It looks like I just need to specify the CHARACTER SET value in my LOAD DATA statement. So the new statement looks like this:

    LOAD DATA LOCAL INFILE '/path/to/output.csv'
        INTO TABLE berlin.subway_distances
        CHARACTER SET 'utf8'
        FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\';