I reviwed the GreenPlum DBA guide. It talks about processing XML messages using STX tranformation.
If we have an XSD and we need to process the XML messages that was compliant with XSD.
How do we approach this situation in GreenPlum?
What are the best approaches?
Are there any other approaches to process XSD based /XML in Greenplum?
Thanks for your insights.
In general, there's no built-in functionality for processing XML files in Greenplum. The thing you are talking about is the gpfdist client-side transformation that allows you to convert your XML to CSV and then process it inside of the database.
The best option in processing the XMLs inside of the Greenplum would be to decompose them to the relational model and load as a set of tables. If you need to process just a general XML files, you can write a function in PL/Java or PL/Python that would parse XML for you and extract the information you need.
Here's an example of the function:
create or replace function parse (inxml varchar, node varchar) returns varchar as $BODY$
from xml.etree import ElementTree
def getText (node, subnode):
res = None
if node is not None:
sn = node.find(subnode)
if sn is not None:
res = sn.text
else:
for el in node:
res2 = getText(el, subnode)
if res2 is not None:
res = res2
break
return res
et = ElementTree.fromstring(inxml.strip())
return getText(et, node)
$BODY$
language plpythonu
volatile;
And here's an example of its invocation:
select parse (
'<?xml version="1.0" encoding="utf-8"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Dont forget me this weekend!</body>
<attachments>
<attachment>file1</attachment>
<attachment>file2</attachment>
<attachment>file3</attachment>
</attachments>
</note>',
'attachment');