I am on a red hat system and I have multiple XML files generated from various SOAP requests that are in a format that is not compatible with MySQL's LoadXML function. I need to load the data into MySQL tables. One table will be setup for each type of XML file, depending on the data received via the Soap XML API.
Sample format of one of the files is as this, but each file will have a different number of columns and different column names. I am trying to find a way to convert them to a compatible format in the most generic way possible since I will have to create any customized solution for each API request/response.
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<dbd:DataRetrievalRequestResponse xmlns:dbd="dbd.v1">
<DataObjects>
<ObjectSelect>
<mdNm>controller-ac</mdNm>
<meNm>WALL-EQPT-A</meNm>
</ObjectSelect>
<DataInstances>
<DataInstance>
<instanceId>DSS1</instanceId>
<Attribute>
<name>Name</name>
<value>DSS1</value>
</Attribute>
<Attribute>
<name>Operational Mode</name>
<value>mode-fast</value>
</Attribute>
<Attribute>
<name>Rate - Down</name>
<value>1099289</value>
</Attribute>
<Attribute>
<name>Rate - Up</name>
<value>1479899</value>
</Attribute>
</DataInstance>
<DataInstance>
<instanceId>DSS2</instanceId>
<Attribute>
<name>Name</name>
<value>DSS2</value>
</Attribute>
<Attribute>
<name>Operational Mode</name>
<value>mode-fast</value>
</Attribute>
<Attribute>
<name>Rate - Down</name>
<value>1299433</value>
</Attribute>
<Attribute>
<name>Rate - Up</name>
<value>1379823</value>
</Attribute>
</DataInstance>
</DataInstances>
</DataObjects>
</dbd:DataRetrievalRequestResponse>
</soap:Body>
</soap:Envelope>
Of course I want the data to be entered into a mysql table with column names 'id, Name, Group' rows for each unique instance
Name | Operational Mode | Rate - Down | Rate - Up |
---|---|---|---|
DSS1 | mode-fast | 1099289 | 1479899 |
DSS2 | mode-fast | 1299433 | 1379823 |
Do I need to create an XSLT and preprocess this XML data from command line prior to running it to LoadXML to get it into a format that MySQL LoadXML function will accept? This would not be a problem, but I am not familiar with XSLT transformations.
Is there a way to reformat the above XML to straight CSV (preferred), or to another XML format that is compatible, such as the examples given in mysql documentation for loadxml?
<row>
<field name='column1'>value1</field>
<field name='column2'>value2</field>
</row>
I tried doing LOAD DATA INFILE and using ExtractValue function, but some of the values have spaces in them, and the delimiter for ExtractValue is hard coded to single-space. This makes it unusable as a workaround.
Your question is very general (which is fine!) so my answer is also quite general.
Firstly, it's certainly true that XSLT is an ideal generic tool for problems of this sort. I have absolutely no doubt that every one of your SOAP messages could be coerced into a suitable form, using an XSLT that's customised for each type of message, while still remaining structurally very similar, which is what you'd want if you're new to XSLT.
I'm not sure how familiar you are with XPath, XML, XML namespaces, etc, but I think the task here is simple enough to tackle, and if you do have any tricky XPath expressions to write you can always come back to StackOverflow and ask for help.
From what you've said it sounds like you're confident that each SOAP message can be mapped to a single table. I'm going to suggest an XSLT pattern that would be customisable for each type of SOAP message, where you have an xsl:for-each
statement that iterates over each row, and within that you create a row
element and populate it with fields.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- indent the output, for ease of reading -->
<xsl:output indent="yes"/>
<!-- process the document -->
<xsl:template match="/">
<!-- create the root element of the output -->
<resultset>
<!-- create each row of the output, by iterating over the
repeating elements in the SOAP message -->
<xsl:for-each
select="//DataInstance">
<row>
<!-- create each field -->
<!-- This field is defined individually, and the value
is produced by evaluating the 'instanceId' xpath
relative to the current DataInstance -->
<field name="id"><xsl:value-of select="instanceId"/></field>
<!-- these field can be generated with a loop -->
<xsl:for-each select="Attribute">
<field name="{name}"><xsl:value-of select="value"/></field>
</xsl:for-each>
</row>
</xsl:for-each>
</resultset>
</xsl:template>
</xsl:stylesheet>
Result of this, run over your sample SOAP message:
<resultset>
<row>
<field name="id">DSS1</field>
<field name="Name">DSS1</field>
<field name="Operational Mode">mode-fast</field>
<field name="Rate - Down">1099289</field>
<field name="Rate - Up">1479899</field>
</row>
<row>
<field name="id">DSS2</field>
<field name="Name">DSS2</field>
<field name="Operational Mode">mode-fast</field>
<field name="Rate - Down">1299433</field>
<field name="Rate - Up">1379823</field>
</row>
</resultset>
If you can follow this general pattern, you should be able to write a custom XSLT for every kind of SOAP message in your collection. You will just need to modify the various XPath expressions in the stylesheet:
//DataInstance
means "every DataInstance"
instanceId
means "the instanceId that's a child of the current ("context") element.
name
means "the name element that's a child of the current element.
value
means "the value element that's a child of the current element.
In the example SOAP message you gave, the Attribute
element maps to a field, so all those elements could be copied generically, with another xsl:for-each
, but for your other documents you may have to just define each field
element individually, as I did for the id
element in my answer.