Search code examples
xmlxslt-1.0filemakerxslt-grouping

XSLT to regroup XML data dynamically from FMPXMLRESULT using only XSLT 1.0


I need to transform XML data from one structure to another. I need the target XML to be structured based on metadata inside the source XML. The source XML has a fixed structure, however the structure of the target XML needs to be built dynamically (including tag names and data groupings) based on metadata in the source XML.

The following source XML provides an example of the structure. The source XML will always use the FMPXMLRESULT structure.

XML

<?xml version="1.0" encoding="UTF-8" ?>
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
  <METADATA>
    <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="ID" TYPE="NUMBER"/>
    <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="Description" TYPE="TEXT"/>
    <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="Customer" TYPE="TEXT"/>
    <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="OrderItem::ProductName" TYPE="TEXT"/>
    <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="OrderItem::UnitPrice" TYPE="NUMBER"/>
    <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="OrderItem::Quantity" TYPE="NUMBER"/>
    <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="OrderItem::TaxCode" TYPE="TEXT"/>
    <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="OrderItem::Total" TYPE="NUMBER"/>
    <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="OrderTaxCode::TaxCode" TYPE="TEXT"/>
    <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="OrderTaxCode::TaxRate" TYPE="NUMBER"/>
  </METADATA>
  <RESULTSET FOUND="2">
    <ROW MODID="1" RECORDID="1">
      <COL><DATA>1</DATA></COL>
      <COL><DATA>Order for first project</DATA></COL>
      <COL><DATA>Customer No 1</DATA></COL>
      <COL>
        <DATA>Product A</DATA>
        <DATA>Product B</DATA>
      </COL>
      <COL>
        <DATA>10.50</DATA>
        <DATA>12.10</DATA>
      </COL>
      <COL>
        <DATA>2</DATA>
        <DATA>1</DATA>
      </COL>
      <COL>
        <DATA>VAT</DATA>
        <DATA>VAT0</DATA>
      </COL>
      <COL>
        <DATA>21</DATA>
        <DATA>12.1</DATA>
      </COL>
      <COL>
        <DATA>VAT</DATA>
        <DATA>VAT0</DATA>
      </COL>
      <COL>
        <DATA>0.2</DATA>
        <DATA>0</DATA>
      </COL>
    </ROW>
    <ROW MODID="1" RECORDID="2">
      <COL><DATA>2</DATA></COL>
      <COL><DATA>Order for second project</DATA></COL>
      <COL><DATA>Customer No 2</DATA></COL>
      <COL>
        <DATA>Product 2A</DATA>
        <DATA>Product 2B</DATA>
      </COL>
      <COL>
        <DATA>1.50</DATA>
        <DATA>345</DATA>
      </COL>
      <COL>
        <DATA>17</DATA>
        <DATA>2</DATA>
      </COL>
      <COL>
        <DATA>VAT0</DATA>
        <DATA>VAT</DATA>
      </COL>
      <COL>
        <DATA>25.5</DATA>
        <DATA>690</DATA>
      </COL>
      <COL>
        <DATA>VAT</DATA>
        <DATA>VAT0</DATA>
      </COL>
      <COL>
        <DATA>0.2</DATA>
        <DATA>0</DATA>
      </COL>
    </ROW>
  </RESULTSET>
</FMPXMLRESULT>

Given the above XML (including metadata) the target XML format would be as follows.

XML

<?xml version="1.0" encoding="UTF-8"?>
<OrderBatch>
  
  <Order>
    <ID>1</ID>
    <Description>Order for first project</Description>
    <Customer>Customer No 1</Customer>
    <OrderItem>
      <ProductName>Product A</ProductName>
      <UnitPrice>10.50</UnitPrice>
      <Quantity>2</Quantity>
      <TaxCode>VAT</TaxCode>
      <Total>21</Total>
    </OrderItem>
    <OrderItem>
      <ProductName>Product B</ProductName>
      <UnitPrice>12.10</UnitPrice>
      <Quantity>1</Quantity>
      <TaxCode>VAT0</TaxCode>
      <Total>12.1</Total>
    </OrderItem>
    <OrderTaxCode>
      <TaxCode>VAT</TaxCode>
      <TaxRate>0.2</TaxRate>
    </OrderTaxCode>
    <OrderTaxCode>
      <TaxCode>VAT0</TaxCode>
      <TaxRate>0</TaxRate>
    </OrderTaxCode>
  </Order>

  <Order>
    <ID>2</ID>
    <Description>Order for second project</Description>
    <Customer>Customer No 2</Customer>
    <OrderItem>
      <ProductName>Product 2A</ProductName>
      <UnitPrice>1.50</UnitPrice>
      <Quantity>17</Quantity>
      <TaxCode>VAT0</TaxCode>
      <Total>25.5</Total>
    </OrderItem>
    <OrderItem>
      <ProductName>Product 2B</ProductName>
      <UnitPrice>345</UnitPrice>
      <Quantity>2</Quantity>
      <TaxCode>VAT</TaxCode>
      <Total>690</Total>
    </OrderItem>
    <OrderTaxCode>
      <TaxCode>VAT</TaxCode>
      <TaxRate>0.2</TaxRate>
    </OrderTaxCode>
    <OrderTaxCode>
      <TaxCode>VAT0</TaxCode>
      <TaxRate>0</TaxRate>
    </OrderTaxCode>
  </Order>

</OrderBatch>

Different metadata in the source XML would produce a different target XML. The general rules are as follows

  1. The field names are contained in METADATA and the data is contained in RESULTSET
  2. The COL nodes in each ROW of RESULTSET correspond to the FIELD nodes in METADATA by position
  3. Any RESULTSET/FIELD where the NAME contains the text "::" should be considered as 'grouped' data
  4. Grouped data should have a group name equal to the text preceding "::" (this symbol will only ever appear once in a FIELD NAME)
  5. Grouped data COL nodes may contain 0, 1 or more DATA child nodes
  6. FIELD nodes that are not grouped (i.e. NAME doesn't contain "::") will always have exactly 1 DATA child node in the COL node
  7. Grouped data fields will always be adjacent (e.g. all fields in the group ORDERITEM:: won't have fields from other groups in between them in the field order)
  8. The group names, field names and field order are not known in advance and may change, the XSLT needs to handle this dynamically. The above XML is a good example of the sort of thing that will need to be handled
  9. I can only use XSLT 1.0
  10. FMPDSORESULT is a deprecated technology and I can't use it

The 2 major sticking points are

  1. Pulling DATA nodes out of COL nodes by position and assigning to their own groups
  2. Achieving what is needed with the metadata separate

I've tried a number of approaches with nested for-each loops, and different ways of organising templates. I wonder if perhaps creating an internal data structure might be the way to go, but I also may be looking at the problem wrong?

Here is the best I've come up with so far, it's the closest I've come but still not nearly close enough

XSLT 1.0

<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:fmp="http://www.filemaker.com/fmpxmlresult"
  exclude-result-prefixes="fmp"
>
  <xsl:output indent="yes"/>


  <!-- the key indexes the METADATA fields by their position -->
  <xsl:key
    name="fieldList"
    match="fmp:METADATA/fmp:FIELD"
    use="count(preceding-sibling::fmp:FIELD) + 1"
  />

  <!-- template for the data section of the FileMaker XML -->
  <xsl:template match="/fmp:FMPXMLRESULT">
    <OrderBatch>
      <xsl:apply-templates select="fmp:RESULTSET/fmp:ROW" />
    </OrderBatch>
  </xsl:template>

    <!-- template for each row -->
  <xsl:template match="fmp:ROW">
        <!-- for each row, create Order element and apply the relevant template for each column -->
    <Order>
      <xsl:apply-templates select="fmp:COL" />
    </Order>
  </xsl:template>

  <!-- template for each column within each row -->
  <xsl:template match="fmp:COL">
        <!-- set $qualified with the name of the field - this will be qualified with the table occurrence if related -->
    <xsl:variable name="qualified" select="string(key('fieldList', position())/@NAME)"/>
    <!-- set $group with the name of the field's group -->
    <xsl:variable name="group" select="substring-before($qualified, '::')"/>
        <!-- set $name to a value for use as an XML element -->
        <xsl:variable name="name">
            <xsl:choose>

                <!-- if the qualified field is related (contains "::") then remove the table occurrence name -->
            <xsl:when test="contains($qualified, '::')">
                    <xsl:value-of select="substring-after($qualified, '::')"/>
            </xsl:when>

                <!-- if the qualified field is not related then just return the field name -->
            <xsl:otherwise>
                    <xsl:value-of select="$qualified"/>
            </xsl:otherwise>
            </xsl:choose>
        </xsl:variable>



    <!-- create the element with the field's name and use the data as the element's value -->
    <xsl:choose>
      <!-- related element - need to group -->
      <xsl:when test="contains($qualified, '::')">
        <!-- group each DATA element in turn -->
        <!-- actually only need to run this on the first COL in a group - but I'll figure that out later -->
        <xsl:for-each select="fmp:DATA">
          <xsl:apply-templates select=".">
            <xsl:with-param name="pGroup" select="$group" />
            <xsl:with-param name="pName" select="$name" />
          </xsl:apply-templates>
        </xsl:for-each>
      </xsl:when>

      <!-- element is at top level so just create the field/value -->
      <xsl:otherwise>
        <xsl:element name="{$name}">
                <xsl:value-of select="." />
            </xsl:element>
        </xsl:otherwise>
        </xsl:choose>

    </xsl:template>

  <!-- template for grouping DATA nodes across multiple COL nodes -->
  <xsl:template match="fmp:DATA">
    <xsl:param name = "pGroup" />
    <xsl:param name = "pName" />
    <xsl:element name="{$pGroup}">
      <xsl:variable name="pos" select="position()" />
        <xsl:apply-templates select="../../fmp:COL" mode="group">
          <xsl:with-param name="pGroup" select="$pGroup" />
          <xsl:with-param name="pName" select="$pName" />
          <xsl:with-param name="pos" select="$pos" />
        </xsl:apply-templates>

    </xsl:element>
  </xsl:template>

  <!-- template for cycling through COL nodes and getting the DATA node if it belongs to the specified group -->
  <xsl:template match="fmp:COL" mode="group">
    <xsl:param name = "pGroup" />
    <xsl:param name = "pName" />
    <xsl:param name = "pos" /> <!-- this will help select the correct DATA node - not sure how to use it yet though -->
    <xsl:variable name="qualified" select="string(key('fieldList', position())/@NAME)"/>
    <xsl:variable name="colGroup" select="substring-before($qualified, '::')"/>
    <xsl:if test="contains($qualified, '::') and $pGroup = $colGroup">
      <xsl:element name="substring-after($qualified, '::')">
                <xsl:value-of select="." />
            </xsl:element>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

I understand that this isn't easy and isn't a normal way of using XSLT (it would normally be written to fit the target structure), however I believe this is a solvable problem and XSLT seems to be capable of far more complex tasks.

Any help on how to solve this would be very welcome. Many thanks in advance.


Solution

  • This may help you. You will need to work out the prefix to use the node-set function as noted. You can verify assumptions and make adjustments. Let me know, if you have questions.

    <xsl:stylesheet
      version="1.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      xmlns:fmp="http://www.filemaker.com/fmpxmlresult"
      xmlns:msxml="urn:schemas-microsoft-com:xslt"
      exclude-result-prefixes="fmp">
    
      <xsl:output indent="yes"/>
    
      <xsl:variable name="metadata">
        <xsl:for-each select="//fmp:FIELD">
          <xsl:element name="element">
            <xsl:variable name="name" select="@NAME"/>
            <xsl:choose>
              <xsl:when test="contains($name, '::')">
                <xsl:element name="parent">
                  <xsl:value-of select="substring-before($name, '::')"/>
                </xsl:element>
                <xsl:element name="child">
                  <xsl:value-of select="substring-after($name, '::')"/>
                </xsl:element>
              </xsl:when>
              <xsl:otherwise>
                <xsl:element name="parent">
                  <xsl:value-of select="$name"/>
                </xsl:element>
                <xsl:element name="child"/>
              </xsl:otherwise>
            </xsl:choose>  
            <xsl:element name="position">
              <xsl:value-of select="position()"/>
            </xsl:element>
        </xsl:element>
        </xsl:for-each>
      </xsl:variable>
    
      <!-- NOTE:  Replace msxml with your prefix that converts RTF's to node sets.  -->
      <xsl:variable name="metadataList" select="msxml:node-set($metadata)"/>
    
      <!-- Get a distinct list of parent elements.  -->
      <xsl:variable name="parents">
        <xsl:copy-of select="$metadataList/element[not(parent = preceding-sibling::element/parent)]"/>  
      </xsl:variable>
    
      <xsl:variable name="parentList" select="msxml:node-set($parents)"/>
    
      <xsl:template match="fmp:FMPXMLRESULT">
        <xsl:element name="OrderBatch">
          <xsl:apply-templates select="node()"/>
        </xsl:element>
      </xsl:template>
    
      <xsl:template match="fmp:ROW">
        <xsl:variable name="rowNode" select="."/> 
        <xsl:element name="Order">
          <!-- Loop thru the distinct list of parents.  -->
          <xsl:for-each select="$parentList/element">
            <xsl:variable name="parent" select="parent"/>
            <xsl:variable name="firstSetPosition" select="position"/>
            <!-- Loop thru on the first set of data nodes for this parent.  -->
            <xsl:for-each select="$rowNode/fmp:COL[number($firstSetPosition)]/fmp:DATA">
              <xsl:variable name="dataposition" select="position()"/>
              <xsl:element name="{$parent}">
                <!-- Loop thru all the child nodes for this parent.  -->
                <xsl:for-each select="$metadataList/element[parent = $parent]">
                  <xsl:variable name="position" select="position"/>
                  <xsl:variable name="child" select="string(child)"/>
                  <xsl:choose>
                    <!-- When the parent has no child nodes.  -->
                    <xsl:when test="$child = ''">
                      <xsl:value-of select="string($rowNode/fmp:COL[number($position)]/fmp:DATA[$dataposition])"/>
                    </xsl:when>
                    <xsl:otherwise>
                      <xsl:element name="{$child}">
                        <xsl:value-of select="string($rowNode/fmp:COL[number($position)]/fmp:DATA[$dataposition])"/>
                      </xsl:element>
                    </xsl:otherwise>
                  </xsl:choose>
                </xsl:for-each>
              </xsl:element>
            </xsl:for-each>
            </xsl:for-each>  
          </xsl:element>
      </xsl:template>
    
      <xsl:template match="fmp:DATA">
        <xsl:value-of select="."/>
      </xsl:template>
    
      <xsl:template match="node()">
          <xsl:apply-templates select="node()"/>
      </xsl:template>
    </xsl:stylesheet>