I am trying to import a small data set of Berlin street addresses using MySQL's LOAD DATA statement. The problem is that after the import runs, all of the beautiful ß characters in the German street names have become ß sets.
Here's the create-table statement I used for this table:
CREATE TABLE `subway_distances` (
`STN` varchar(255) DEFAULT NULL,
`HNR` int(9) DEFAULT NULL,
`Lat` decimal(36,15) DEFAULT NULL,
`Lon` decimal(36,15) DEFAULT NULL,
`Distance` decimal(45,20) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8
... and here is my MySQL shell code:
charset utf8;
TRUNCATE TABLE subway_distances;
LOAD DATA LOCAL INFILE '/path/to/output.csv'
INTO TABLE berlin.subway_distances
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\';
SELECT * FROM subway_distances LIMIT 0,10;
I have looked at output.csv in vim, and the eszett character appears to be fine there.
I am assuming that I simply need a different encoding declaration in MySQL, but I'm not sure where to start.
I am also assuming that collation doesn't matter yet, since I'm not comparing values -- just purely trying to get a valid import.
I found an answer to this relatively quickly. It looks like I just need to specify the CHARACTER SET value in my LOAD DATA statement. So the new statement looks like this:
LOAD DATA LOCAL INFILE '/path/to/output.csv'
INTO TABLE berlin.subway_distances
CHARACTER SET 'utf8'
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\';