Search code examples
mysqlpostgresqlpostgresql-copy

error using copy from in postgresql when value is '2"%'


I have a table with 300,000 rows not under my control that I need to import.

I export the data from mysql using:

mysqldump -u root --password=secret --fields-enclosed-by='\"' -T/tmp apflora_beob

Then I try to import each table using for example:

\copy beob_evab FROM '/Users/alex/tmp/beob_evab.txt' (FORMAT 'csv', DELIMITER E'\t', NULL '\N', ESCAPE E'\"', ENCODING 'UTF8');

This works for most tables, also for one containing 450,000 rows. But on one I get this error:

ERROR:  value too long for type character varying(10)
CONTEXT:  COPY beob_evab, line 190310, column COUV_MOUSSES: "2\%  \N  \N  \N  \N  \N  \N  \N  \N  \N  \N  30  \N  15  \N  \N  \N  \N  \N  \N  \N  \N  \N  \N  \N  \N  0.01  \N  \N
Bachs, Dau..."

When I check field COUV_MOUSSES in line 190'310 it contains this value: 2"%.

What would I have to change to get this to work?

In order to achieve the correct output "2""%" according to @klin (thanks!) I would have to do add --fields-escaped-by='"' to the mysqldump command. So that becomes:

mysqldump -u root --password=secret --fields-enclosed-by='"' --fields-escaped-by='"' -T/tmp apflora_beob

Problem is: now Null values are exported as "N.

Well, no problem, I thought and changed the copy command to:

\copy beob_evab FROM '/Users/alex/tmp/beob_evab.txt' (FORMAT 'csv', DELIMITER E'\t', NULL '"N', ESCAPE "'", ENCODING 'UTF8');

This produces the following error in Postgres:

ERROR:  CSV quote character must not appear in the NULL specification

And it seems that the representation of Null values can not be changed in MySQL.


Solution

  • The value should be written down as

    "2""%"
    

    Per RFC 4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files:

    [5]. Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.(...)

    [7]. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

       "aaa","b""bb","ccc"