Search code examples
xmlxsltfilemaker

XSL Transform of Nested Attributes


My task at hand is to strip the Text and any associated Font Attributes from the following (simplified) example XML into a FileMaker database. Example XML:

<Font Id="Arial" Script="normal" Size="32" Underlined="no" Italic="no" Weight="normal">
    <Paragraph>
        <Text>This <Font Italic="yes">word</Font> is italic</Text>
        <Text>This entire line has no formatting</Text>
        <Text>This<Font Italic="yes">line</Font><Font Underlined="yes" Italic = "yes"> has multiple formats</Font></Text>
    </Paragraph>

    <Paragraph>
        <Text>This is the first line of the second paragraph and has no formatting</Text>
        <Text>This line also has no formatting</Text>
        <Text><Font Underlined="yes">This entire line is underlined</Font></Text>
    </Paragraph>
</Font>

As you can see, the <Paragraph> element is enclosed by a ` node. (I hope I am referring to these parts correctly). I have been successful in writing code to transfer the individual FULL lines of text into the database along with the nested Font attribute(s) when there is either NO nested Font attribute(s), or if the nested Font attribute(s) enclose the ENTIRE Text data. What I'm stuck on is how to deal with lines of text that have nested attributes WITHIN the text data, such as the first line in the first paragraph and the third line in the first paragraph.

What I'm looking to do is capture each snippet of data, along with its attributes. My schema allows for up to three nested Font attributes per line of text (a, b, c). Using the example XML file, my FileMaker database should look like this (simplified) for Paragraph 1:

Record 1
Line 1a Text: This
Line 1a Italic: (no value)
Line 1a Underlined: (no value)

Line 1b Text: word
Line 1b Italic: yes
Line 1b Underlined: (no value)

Line 1c Text: is italic
Line 1c Italic: (no value)
Line 1c Underlined: (no value)

Line 2a Text: This entire line has no formatting
Line 2a Italic: (no value)
Line 2a Underlined: (no value)

Line 2b Text: (no value)
Line 2b Italic:  (no value)
Line 2b Underlined: (no value)

Line 2c Text: (no value)
Line 2c Italic: (no value)
Line 2c Underlined: (no value)

Line 3a Text: This
Line 3a Italic: (no value)
Line 3a Underlined: (no value)

Line 3b Text: line
Line 3b Italic: yes
Line 3b Underlined: (no value)

Line 3c Text: has multiple formats
Line 3c Italic: yes
Line 3c Underlined: yes

Of course, I will not be able to predict when and where formatting will be applied. I hope I have been clear, and thank you very much, in advance, for any pointers that you can provide to help me accomplish this task.


Solution

  • I would suggest you try something like this, at least as your starting point:

    XSLT

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:template match="/Font">
        <FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
            <METADATA>
                <FIELD NAME="Text"/>
                <FIELD NAME="IsItalic" TYPE="NUMBER"/>
                <FIELD NAME="IsUnderline" TYPE="TEXT"/>
                <FIELD NAME="Paragraph" TYPE="TEXT"/>
            </METADATA>
            <RESULTSET>
                <!-- create a record for each text node, descendant of Paragraph -->
                <xsl:for-each select="Paragraph//text()">
                    <ROW>
                        <!-- get the value of the current text node itself  -->
                        <COL><DATA><xsl:value-of select="."/></DATA></COL>
                        <!-- get the value of @Italic from the nearest ancestor that has such attribute -->
                        <COL><DATA><xsl:value-of select="ancestor::*[@Italic][1]/@Italic"/></DATA></COL>
                        <!-- get the value of @Underlined from the nearest ancestor that has such attribute -->
                        <COL><DATA><xsl:value-of select="ancestor::*[@Underlined][1]/@Underlined"/></DATA></COL>
                        <!-- get the ID of the ancestor Paragraph -->
                        <COL><DATA><xsl:value-of select="generate-id(ancestor::Paragraph)"/></DATA></COL>
                    </ROW>
                </xsl:for-each>
            </RESULTSET>
        </FMPXMLRESULT>
    </xsl:template>
    
    </xsl:stylesheet>
    

    Applied to your input example, you will get:

    enter image description here

    Note that the paragraph ID's are unique within the scope of the current transformation only, not universally.


    Added:

    Against my better judgment, here's a stylesheet that will create a record for each Paragraph, with each record being a strict grid of 3 lines by 3 text nodes.

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns="http://www.filemaker.com/fmpxmlresult">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:template match="/Font">
        <FMPXMLRESULT>
            <METADATA>
                <FIELD NAME="Line 1a Text"/>
                <FIELD NAME="Line 1a Italic"/>
                <FIELD NAME="Line 1a Underlined"/>
    
                <FIELD NAME="Line 1b Text"/>
                <FIELD NAME="Line 1b Italic"/>
                <FIELD NAME="Line 1b Underlined"/>
    
                <FIELD NAME="Line 1c Text"/>
                <FIELD NAME="Line 1c Italic"/>
                <FIELD NAME="Line 1c Underlined"/>
    
                <FIELD NAME="Line 2a Text"/>
                <FIELD NAME="Line 2a Italic"/>
                <FIELD NAME="Line 2a Underlined"/>
                <FIELD NAME="Line 2b Text"/>
                <FIELD NAME="Line 2b Italic"/>
                <FIELD NAME="Line 2b Underlined"/>
    
                <FIELD NAME="Line 2c Text"/>
                <FIELD NAME="Line 2c Italic"/>
                <FIELD NAME="Line 2c Underlined"/>
    
                <FIELD NAME="Line 3a Text"/>
                <FIELD NAME="Line 3a Italic"/>
                <FIELD NAME="Line 3a Underlined"/>
    
                <FIELD NAME="Line 3b Text"/>
                <FIELD NAME="Line 3b Italic"/>
                <FIELD NAME="Line 3b Underlined"/>
    
                <FIELD NAME="Line 3c Text"/>
                <FIELD NAME="Line 3c Italic"/>
                <FIELD NAME="Line 3c Underlined"/>
            </METADATA>
            <RESULTSET>
                <!-- create a record for each Paragraph -->
                <xsl:for-each select="Paragraph">
                    <ROW>
                        <!-- for each line ...  -->
                        <xsl:for-each select="Text">
                            <xsl:variable name="text-nodes" select=".//text()" />
                            <!-- process the first three text nodes  -->
                            <xsl:call-template name="create-cells">
                                <xsl:with-param name="text-node" select="$text-nodes[1]"/>
                            </xsl:call-template>
                            <xsl:call-template name="create-cells">
                                <xsl:with-param name="text-node" select="$text-nodes[2]"/>
                            </xsl:call-template>
                            <xsl:call-template name="create-cells">
                                <xsl:with-param name="text-node" select="$text-nodes[3]"/>
                            </xsl:call-template>
                        </xsl:for-each> 
                    </ROW>
                </xsl:for-each>
            </RESULTSET>
        </FMPXMLRESULT>
    </xsl:template>
    
    <xsl:template name="create-cells">
        <xsl:param name="text-node"/>
        <!-- get the value of the text node itself  -->
        <COL><DATA><xsl:value-of select="$text-node"/></DATA></COL>
        <!-- get the value of @Italic from the nearest ancestor that has such attribute -->
        <COL><DATA><xsl:value-of select="$text-node/ancestor::*[@Italic][1]/@Italic"/></DATA></COL>
        <!-- get the value of @Underlined from the nearest ancestor that has such attribute -->
        <COL><DATA><xsl:value-of select="$text-node/ancestor::*[@Underlined][1]/@Underlined"/></DATA></COL>
    </xsl:template>
    
    </xsl:stylesheet>
    

    The result will look something like this (two records shown in List view):

    enter image description here