Search code examples
postgresqlcsvcommand-line-interfacedataloader

Best way to bulk load CSV file into PostgreSQL table


I have a tab separated CSV file in D:\DataSet\business_names_202007/businessDataSet.csv with 600k records (which may increase). I want to load entire data into below postgresql table.

PostgreSQL table:

PostgreSQL Structure

CSV file structure:

CSV File Structure

As you can see above structure, column count differs between CSV file and db ("transform_business" column extra in DB). while loading data we need to add this as well, the value for this column is given below.

"transform_business" column value in db is same as "BN_NAME" in the DataSet with following changes: Convert to UPPERCASE and all spaces should be removed between words

eg:

  • BN_NAME: Melbourne Collision Repair

  • transform_business: MELBOURNECOLLISIONREPAIR

  • Tool: Dbeaver

  • DB Schema: testDev

  • tablename: testdevtable

Tutorial link

Temporarily formed copt statement:

COPY testdevtable(register_name,bn_name,bn_status,transform_business) 
FROM 'D:\DataSet\business_names_202007/businessDataSet.csv' DELIMITER E’\t’ CSV HEADER;

Solution

  • If it where me I would add a ON INSERT trigger to business table that transforms bn_name --> transform_business during the INSERT. Then leave transform_business out of the COPY. The second option is to do the transformation before you load it and add transform_business and the new data to the CSV file.

    An example SQL code snippet that will do the transformation:

    SELECT upper(replace('Melbourne Collision Repair Centre Mentone', ' ', ''));
                     upper                 
    ---------------------------------------
     MELBOURNECOLLISIONREPAIRCENTREMENTONE
    
    

    Another option is to just COPY (register_name,bn_name,bn_status) into the database(with no trigger on table) and then run:

    UPDATE business SET transform_business = upper(replace(bn_name, ' ', ''));
    
    

    Not sure what happens after that, whether the transform_business value is going to be entered with new data or not. If the user/application is not going to enter it then I think you are back to a trigger on business table that runs upper(replace(bn_name, ' ', '')).