I have a requirement where I am receiving a XML file on a Unix path. I want to copy the content of this xml file in a Postgres table in one column.
Example file received at : /home/bojack/test.xml
<address>
<street>Main Street</street>
<city>New York</city>
<country>USA</country>
</address>
Want to copy this to a table with 2 columns,
id value
1 <address>
<street>Main Street</street>
<city>New York</city>
<country>US</country>
</address>
Could anyone please help on this?
First the table:
# CREATE TABLE test_table(id serial, value text); -- notice id type serial
CREATE TABLE
Then, let's add a random attribute to an element (just to demonstrate quotes in the content):
$ cat file
<address>
<street some="attribute">Main Street</street><!--attribute here-->
<city>New York</city>
<country>USA</country>
</address>
Now some awk to slurp in the whole file, to duplicate all double quotes in the content and to add (or replace all) double quotes to the beginning and end:
$ awk ' # tested with gawk, mawk and awk version 20121220
BEGIN {
RS="^$" # slurp in the whole file
}
{
gsub(/"/,"\"\"") # double the double quotes
# gsub(/\r?\n/,"") # to remove newlines, uncomment (untested)
gsub(/^"*|"*$/,"\"") # add quotes to the beginning and end
print # output
}' file
Output with above awk (notice "
s in the beginning and the end and ""
s in the middle):
"<address>
<street some=""attribute"">Main Street</street>
<city>New York</city>
<country>USA</country>
</address>
"
Now to process the file with awk and to feed it's output to psql:
$ awk 'BEGIN{RS="^$"}{gsub(/"/,"\"\"");gsub(/^"*|"*$/,"\"");print}' file |
> psql -h host -d dbname -c "\copy test_table(value) from '/dev/stdin' csv" -U username
Password:
COPY 1
And finally:
# select * from test_table ;
id | value
----+--------------------------------------------------
1 | <address> +
| <street some="attribute">Main Street</street>+
| <city>New York</city> +
| <country>USA</country> +
| </address> +
|
(1 row)
This was tested in Linux using Bash shell.