I'm trying to import this Hungarian text file into a MySQL table but the Hungarian characters are always corrupted. I've tried many encodings for both the import file and table haven't found the right combination.
The file format is one word and one number per line, separated by a space. My import table has two columns, varchar and integer. I'm using MySQL 5.5.16 and phpMyAdmin 3.4.5. phpMyAdmin solution is preferred but I can use the command line if necessary.
Thanks in advance!
EDIT: Broken link above fixed
Your file appears to be encoded in UTF-8. For example:
$ unzip -p hu_50K.zip | sed -n 59p | xxd 0000000: 6bc3 b673 7ac3 b66e c3b6 6d20 3532 3030 k..sz..n..m 5200 0000010: 310d 0a 1..
I understand that "köszönöm" is Hungarian for "thank you". If that is what row 59 of the file is supposed to contain, then the ö
character (U+00F6) is encoded as 0xc3b6
, which is UTF-8.
To import this file using LOAD DATA INFILE
:
LOAD DATA [LOCAL] INFILE '/path/to/hu_50K.txt'
INTO TABLE my_table
CHARACTER SET utf8
FIELDS
TERMINATED BY ' '
LINES
TERMINATED BY '\r\n'
(col_word, col_number)
Of course, col_word
must be able to hold the characters - which it necessarily will if it is also encoded in UTF-8.