Search code examples
mysqlcharacter-encodingmysqlimport

How do I import this file into MySQL?


I'm trying to import this Hungarian text file into a MySQL table but the Hungarian characters are always corrupted. I've tried many encodings for both the import file and table haven't found the right combination.

The file format is one word and one number per line, separated by a space. My import table has two columns, varchar and integer. I'm using MySQL 5.5.16 and phpMyAdmin 3.4.5. phpMyAdmin solution is preferred but I can use the command line if necessary.

Thanks in advance!

EDIT: Broken link above fixed


Solution

  • Your file appears to be encoded in UTF-8. For example:

    $ unzip -p hu_50K.zip | sed -n 59p | xxd
    0000000: 6bc3 b673 7ac3 b66e c3b6 6d20 3532 3030  k..sz..n..m 5200
    0000010: 310d 0a                                  1..
    

    I understand that "köszönöm" is Hungarian for "thank you". If that is what row 59 of the file is supposed to contain, then the ö character (U+00F6) is encoded as 0xc3b6, which is UTF-8.

    To import this file using LOAD DATA INFILE:

    LOAD DATA [LOCAL] INFILE '/path/to/hu_50K.txt'
        INTO TABLE my_table
        CHARACTER SET utf8
        FIELDS
            TERMINATED BY ' '
        LINES
            TERMINATED BY '\r\n'
        (col_word, col_number)
    

    Of course, col_word must be able to hold the characters - which it necessarily will if it is also encoded in UTF-8.