Search code examples
postgresqlcsvforeign-data-wrapper

Postgresql 9.5.7: Is it possible to skip a row with missing data when inserting from a foreign data wrapper table (around a csv)?


So I have this big CSV-file included into my database as a foreign data wrapper table and I want to copy its content into a proper table. But the CSV is partially malformed and gives me the 'extra data' or 'missing data' error on multiple occasions due to too many data values or too little on some lines. I know I could modify the CSV-file manually before importing, but I would prefer a way of just skipping those lines. Is there a way of doing this?

The fdw tables are created by:

CREATE EXTENSION file_fdw;
CREATE SERVER import_server FOREIGN DATA WRAPPER file_fdw;

Solution

  • So, I have succeded by working around the issue in the following way:

    I have included the csv file as a pure text file foreign table with this command:

    CREATE FOREIGN TABLE table_csv (
        VALUE TEXT
    )
    SERVER csv_import_server OPTIONS(
    filename './data.csv', format 'text');
    

    So all the data in the csv is not separated by their columns but everything is bunched together in one ugly text column.

    Then I Insert into a table with proper column structure by splitting the single column in the foreign table with such a command:

    CREATE TABLE table_target (
        id INTEGER PRIMARY KEY,
        value1 TEXT,
        value2 TEXT );   
    
    
    INSERT INTO table_target (id, value1, value2) SELECT
        column1::INTEGER, value1, value2
            FROM ( SELECT 
                    SPLIT_PART ( value, ',', 1) AS id, 
                    SPLIT_PART ( value, ',', 2) AS value1, 
                    SPLIT_PART ( value, ',', 3) AS value2
                        FROM table_csv ) AS tmp WHERE (
                            tmp.id ~ '^[0-9]+$' AND
                            tmp.value1 != '' AND
                            tmp.value2 != '' );
    

    After extensive testing I could import any malformed csv and am able to filter for valid rows with the help of the last WHERE clause.

    It's a bit slow though, but clean and without any external intermediate csv-cleaning proceess in between.