Search code examples
mysqllinuxxmlxsltrhel

How do I reformat XML to work with MySQL LoadXML


I am on a red hat system and I have multiple XML files generated from various SOAP requests that are in a format that is not compatible with MySQL's LoadXML function. I need to load the data into MySQL tables. One table will be setup for each type of XML file, depending on the data received via the Soap XML API.

Sample format of one of the files is as this, but each file will have a different number of columns and different column names. I am trying to find a way to convert them to a compatible format in the most generic way possible since I will have to create any customized solution for each API request/response.

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   <soap:Body>
      <dbd:DataRetrievalRequestResponse xmlns:dbd="dbd.v1">
         <DataObjects>
            <ObjectSelect>
               <mdNm>controller-ac</mdNm>
               <meNm>WALL-EQPT-A</meNm>
            </ObjectSelect>
            <DataInstances>
               <DataInstance>
                  <instanceId>DSS1</instanceId>
                  <Attribute>
                     <name>Name</name>
                     <value>DSS1</value>
                  </Attribute>
                  <Attribute>
                     <name>Operational Mode</name>
                     <value>mode-fast</value>
                  </Attribute>
                  <Attribute>
                     <name>Rate - Down</name>
                     <value>1099289</value>
                  </Attribute>
                  <Attribute>
                     <name>Rate - Up</name>
                     <value>1479899</value>
                  </Attribute>
                </DataInstance>
                <DataInstance>
                  <instanceId>DSS2</instanceId>
                  <Attribute>
                     <name>Name</name>
                     <value>DSS2</value>
                  </Attribute>
                  <Attribute>
                     <name>Operational Mode</name>
                     <value>mode-fast</value>
                  </Attribute>
                  <Attribute>
                     <name>Rate - Down</name>
                     <value>1299433</value>
                  </Attribute>
                  <Attribute>
                     <name>Rate - Up</name>
                     <value>1379823</value>
                  </Attribute>
                </DataInstance>
             </DataInstances>
          </DataObjects>
       </dbd:DataRetrievalRequestResponse>
    </soap:Body>
 </soap:Envelope>

Of course I want the data to be entered into a mysql table with column names 'id, Name, Group' rows for each unique instance

Name Operational Mode Rate - Down Rate - Up
DSS1 mode-fast 1099289 1479899
DSS2 mode-fast 1299433 1379823

Do I need to create an XSLT and preprocess this XML data from command line prior to running it to LoadXML to get it into a format that MySQL LoadXML function will accept? This would not be a problem, but I am not familiar with XSLT transformations.

Is there a way to reformat the above XML to straight CSV (preferred), or to another XML format that is compatible, such as the examples given in mysql documentation for loadxml?

<row>
  <field name='column1'>value1</field>
  <field name='column2'>value2</field>
</row>

I tried doing LOAD DATA INFILE and using ExtractValue function, but some of the values have spaces in them, and the delimiter for ExtractValue is hard coded to single-space. This makes it unusable as a workaround.


Solution

  • Your question is very general (which is fine!) so my answer is also quite general.

    Firstly, it's certainly true that XSLT is an ideal generic tool for problems of this sort. I have absolutely no doubt that every one of your SOAP messages could be coerced into a suitable form, using an XSLT that's customised for each type of message, while still remaining structurally very similar, which is what you'd want if you're new to XSLT.

    I'm not sure how familiar you are with XPath, XML, XML namespaces, etc, but I think the task here is simple enough to tackle, and if you do have any tricky XPath expressions to write you can always come back to StackOverflow and ask for help.

    From what you've said it sounds like you're confident that each SOAP message can be mapped to a single table. I'm going to suggest an XSLT pattern that would be customisable for each type of SOAP message, where you have an xsl:for-each statement that iterates over each row, and within that you create a row element and populate it with fields.

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
       <!-- indent the output, for ease of reading -->
       <xsl:output indent="yes"/>
       <!-- process the document -->
       <xsl:template match="/">
          <!-- create the root element of the output -->
          <resultset>
             <!-- create each row of the output, by iterating over the
             repeating elements in the SOAP message -->
             <xsl:for-each 
                select="//DataInstance">
                <row>
                   <!-- create each field -->
                   <!-- This field is defined individually, and the value
                        is produced by evaluating the 'instanceId' xpath
                        relative to the current DataInstance -->
                   <field name="id"><xsl:value-of select="instanceId"/></field>
                   <!-- these field can be generated with a loop -->
                   <xsl:for-each select="Attribute">
                      <field name="{name}"><xsl:value-of select="value"/></field>
                   </xsl:for-each>
                </row>
             </xsl:for-each>
          </resultset>
       </xsl:template>
    </xsl:stylesheet>
    
    

    Result of this, run over your sample SOAP message:

    <resultset>
       <row>
          <field name="id">DSS1</field>
          <field name="Name">DSS1</field>
          <field name="Operational Mode">mode-fast</field>
          <field name="Rate - Down">1099289</field>
          <field name="Rate - Up">1479899</field>
       </row>
       <row>
          <field name="id">DSS2</field>
          <field name="Name">DSS2</field>
          <field name="Operational Mode">mode-fast</field>
          <field name="Rate - Down">1299433</field>
          <field name="Rate - Up">1379823</field>
       </row>
    </resultset>
    

    If you can follow this general pattern, you should be able to write a custom XSLT for every kind of SOAP message in your collection. You will just need to modify the various XPath expressions in the stylesheet: //DataInstance means "every DataInstance" instanceId means "the instanceId that's a child of the current ("context") element. name means "the name element that's a child of the current element. value means "the value element that's a child of the current element.

    In the example SOAP message you gave, the Attribute element maps to a field, so all those elements could be copied generically, with another xsl:for-each, but for your other documents you may have to just define each field element individually, as I did for the id element in my answer.