Search code examples
postgresqlshellunixawkpsql

Copy contents of a Unix File to Postgres Table column


I have a requirement where I am receiving a XML file on a Unix path. I want to copy the content of this xml file in a Postgres table in one column.

Example file received at : /home/bojack/test.xml

<address>
   <street>Main Street</street>
   <city>New York</city>
   <country>USA</country>
</address>

Want to copy this to a table with 2 columns,

id value
1  <address>
   <street>Main Street</street>
   <city>New York</city>
   <country>US</country>
   </address>

Could anyone please help on this?


Solution

  • First the table:

    # CREATE TABLE test_table(id serial, value text); -- notice id type serial
    CREATE TABLE
    

    Then, let's add a random attribute to an element (just to demonstrate quotes in the content):

    $ cat file
    <address>
       <street some="attribute">Main Street</street><!--attribute here-->
       <city>New York</city>
       <country>USA</country>
    </address>
    

    Now some awk to slurp in the whole file, to duplicate all double quotes in the content and to add (or replace all) double quotes to the beginning and end:

    $ awk '                   # tested with gawk, mawk and awk version 20121220
    BEGIN {
        RS="^$"               # slurp in the whole file
    }
    {
        gsub(/"/,"\"\"")      # double the double quotes
        # gsub(/\r?\n/,"")    # to remove newlines, uncomment (untested)
        gsub(/^"*|"*$/,"\"")  # add quotes to the beginning and end
        print                 # output
    }' file
    

    Output with above awk (notice "s in the beginning and the end and ""s in the middle):

    "<address>
       <street some=""attribute"">Main Street</street>
       <city>New York</city>
       <country>USA</country>
    </address>
    "
    

    Now to process the file with awk and to feed it's output to psql:

    $ awk 'BEGIN{RS="^$"}{gsub(/"/,"\"\"");gsub(/^"*|"*$/,"\"");print}' file |
    > psql -h host -d dbname -c "\copy test_table(value) from '/dev/stdin' csv" -U username
    Password: 
    COPY 1
    

    And finally:

    # select * from test_table ;
     id |                      value                       
    ----+--------------------------------------------------
      1 | <address>                                       +
        |    <street some="attribute">Main Street</street>+
        |    <city>New York</city>                        +
        |    <country>USA</country>                       +
        | </address>                                      +
        | 
    (1 row)
    

    This was tested in Linux using Bash shell.