Search code examples
postgresqlcsvnullpostgresql-copy

Quoted NULL values makes PostgreSQL COPY command fail


I have a big CSV file with all columns quoted with ". Null values are represented as "", and column separator is |. I would like to use the COPYcommand from postgresql (version 10.7) to ingest those files. I have tried many combination, but the more natural to me is the following:

COPY test.large
FROM '/path/to/big.file'
WITH (
    FORMAT CSV,
    HEADER,
    DELIMITER '|',
    QUOTE '"',
    NULL ''
);

My underlying table expects a column for integers and there are lines where NULL value are set to ...|""|... instead of some number ...|"123456"|.... And unfortunately, this makes COPY to crash, saying:

ERREUR:  syntaxe en entrée invalide pour l'entier : «  »
CONTEXT:  COPY regpat_pct_app_reg, ligne 2743, colonne appid : «  »

Sorry, it is in French from this terminal. Anyway, it says: Invalid syntax for integer : « » At the line 2743 we found:

...000205"|""|"XY...

Which is a NULL value, but I cannot found how to properly setup COPY command switch to make postgresql ingest those files.

The documentation says:

NULL

Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.

I am aware that I can clean the file by changing ...|""|... to ...||... using sed or some regexp/replace operations. This will solve my problem, I have checked it.

What I am wondering: is it possible to bear it with postgresql, after all this is perfectly valid CSV format.

Update

Reading the following post as suggested, I wrote:

COPY test.large
FROM '/path/to/big.file'
WITH (
        FORMAT CSV,
        HEADER,
        DELIMITER '|',
        QUOTE '"',
        NULL '',
        FORCE_NULL appid
    );

I get:

ERREUR:  l'argument de l'option « force_null » doit être une liste de noms de colonnes

Which translates to force_null must be a list of column names. Any idea?


Solution

  • It seems it is a slight inconsistency in the COPY command switches, because the following call:

    COPY test.large
    FROM '/path/to/big.file'
    WITH CSV HEADER DELIMITER '|' QUOTE '"' FORCE NULL appid;
    

    Works as expected or you must add the _ if you use the key value pair notation:

    COPY test.large
    FROM '/path/to/big.file'
    WITH (
        FORMAT CSV,
        HEADER,
        DELIMITER '|',
        QUOTE '"',
        FORCE_NULL (appln_id)
    );