I am currently using MySQL Aurora 5.7 version.
I am trying to load data into MySQL with the 'load data' function.
MySQL's encoding is utf8mb4 and an error occurs when loading the data below.
Using latin1 encoding solves this problem, but in this case, there is a problem that Hangul is not displayed properly. (Korean Language)
So I can't use latin1 encoding.
In this case, what would be best to do it for loading?
We ask for your help.
CSV file : ^A means Ctrl + V + A
"sab:0000","þÿÿÿ^A"
"sab:0000","가나다"
The database language set is also utf8mb4.
CREATE TABLE `stats_string` (
`_key` varchar(128) COLLATE utf8mb4_unicode_ci NOT NULL,
`_value` longtext COLLATE utf8mb4_unicode_ci,
PRIMARY KEY (`_key`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
$1, $2, $3 means csv filename, table name, database name
load data local infile '$1' into table $2 CHARACTER SET utf8mb4 fields terminated by ',' enclosed by '\"' ESCAPED BY '\b' lines terminated by '\\r\\n';" $3
ERROR 1300 (HY000) at line 1: Invalid utf8mb4 character string: '"'
All text files, without exception, are encoded in some character set or other. Many files are encoded in latin-1 and contain just its ASCII subset. That's true of most source code files, like HTML and so forth.
Other files are encoded in utf8. Some may be encoded in other character sets.
Your CSV file is encoded in some character set. You've determined that it is not latin-1 or utf8. What character set is it? Ask the person who provided it, or analyze their workflow to figure it out. Then mention that character set in your LOAD DATA
command.
MySQL has excellent support for converting between character sets. When you mention a character set in a LOAD DATA
command, you are instructing MySQL to convert the data from that character set to the character set declared for the table (or column). You're not changing the character set used in the table.
You mentioned in a comment that some sort of policy prevents you from using the euckr
character set. Maybe your csv file comes to you encoded in that character set (I don't know whether it does; I don't have your file to look at). If so you will have to convert it upon loading. This sort of conversion requirement is very common when loading database tables.
Of course your data tables should be encoded in utf8mb4
and not some legacy character set. But you may have to convert some of your source data from legacy character sets as you load it.