Search code examples
sqlarrayspostgresqlcsvidentifier

How to change default identifier {..} for arrays in postgres?


I have large .csv files which include arrays with [..] as identifier. Obviously postgres has default {..} for arrays. How can i change it to import them properly and quickly? I'm not supposed to configure the .csv files. In the below example our array is genres.

Example:

create table title_basics (
    tconst text primary key,
    titleType text not null,
    primaryTitle text not null,
    genres text ARRAY
);


copy title_basics(tconst,titleType,primaryTitle,genres)
from 'C:\....\title_basics.csv' delimiter ',' csv header;

Dataset example:

tconst,titleType,primaryTitle,genres
t0001,movie,Miss Jerry,"[""Action"",""Comedy""]"

Solution

  • Best Practice:

    1. Take the field as text from .csv
    2. Update the column and replace the stuff you don't want.
    3. With alter table transform your column to table.

    Example code:

    create table title_basics 
    (
        ...
        genres text
    );
    

    copy title_basics(tconst,...,genres)
    from 'C:...\title_basics.csv' delimiter ',' csv header;
    

    update title_basics set genres=replace(genres,'"','');
    update title_basics set genres=replace(genres,'[','');
    update title_basics set genres=replace(genres,']','');
    alter table title_basics
        alter genres type text[] using string_to_array(genres,',');