Search code examples
postgresqlpostgresql-copy

How to make a CSV file with arrays of varchar containing double quotes for PostgreSQL COPY


I am preparing a CSV for loading onto a table using the SQL command:

COPY table FROM filename WITH (FORMAT csv, HEADER)

This particular table has a varchar[] (array of strings) column, and some of the items in these arrays contain special characters such as double quotes (") and backslashes (\), some of which can break the CSV format or the array notation. Examples:

"quote"
x1"x2
C:\Users

I've made a few attempts so far (attempt is the actual content of the column in the CSV file):

{"""quote""","x1""x2","C:\Users"}
ERROR:  extra data after last expected column
CONTEXT:  COPY table, line 1: "{"""quote""","x1""x2","C:\Users"}"

"{"""quote""","x1""x2","C:\Users"}"
Inserted value is missing characters:
{quote,x1x2,C:Users}

"{"\""quote\""","x1\""x2","C:\\Users"}"
Inserted value is missing characters:
{"quote\"",x1x2,"C:\\Users"}

What is the correct syntax to insert values with double quotes and backslashes from a CSV?

PostgreSQL version is 13.


Solution

  • The varchar[] column must be formatted using a CSV-formatted version of PostgreSQL array notation. Unfortunately, as my example shows, this can get quite hairy.

    Here are some examples how the column should be quoted:

    copy (select
        'other-column'::text, 
        '{foo,bar,baz,Lucky''s Bar,"dbl\"quote",literal-backslash\\,brackets\[\]\{\}}'::varchar[],
        'another-column'::text
    ) to stdout with (format csv);
    other-column,"{foo,bar,baz,Lucky's Bar,""dbl\""quote"",""literal-backslash\\"",""brackets[]{}""}",another-column
    

    PostgreSQL has this to say on quoting array elements:

    [W]hen writing an array value you can use double quotes around any individual array element. You must do so if the element value would otherwise confuse the array-value parser. For example, elements containing curly braces, commas (or the data type's delimiter character), double quotes, backslashes, or leading or trailing whitespace must be double-quoted. Empty strings and strings matching the word NULL must be quoted, too. To put a double quote or backslash in a quoted array element value, precede it with a backslash. Alternatively, you can avoid quotes and use backslash-escaping to protect all data characters that would otherwise be taken as array syntax.

    As with all CSV documents, if your column data contains quotes, spaces, commas, or other special characters, the column must be CSV-quoted. Or just always quote them to be safe. Normal CSV output libraries will auto-quote columns in your data.