Search code examples
xmlxsltxpath-1.0

How can I determine if an XML node has a last ancestor that does `not` have a certain attribute?



  • Any <p> tag within the <body> tags should be transformed to Body_Text.

  • The <p> tags that have a last ancestor <sec> without the attribute "sec-type" should be transformed to Flush_Text (which overrides the first Body_Text transformation here).

  • The <p> tags that have a last ancestor <sec sec-type="irrelevant-attribute-name> (with the attribute "sec-type") should be transformed to Body_Text.




<sec><p>asdf</p></sec> should be transformed into <sec><Flush_Text>asdf</Flush_Text></sec>.

<sec sec-type="whatevs"><p>asdf</p></sec> should be <sec sec-type="whatevs"><Body_Text>asdf</Body_Text></sec>.


Also, any further nesting into an ancestor with this sec-type attribute should still be Body_Text:

<sec sec-type="whatevs"><sec><p>asdf</p></sec></sec> should be <sec sec-type="whatevs"><sec><Body_Text>asdf</Body_Text></sec>.




Here is my XML:

<root>
  <body>
  <sec sec-type="asdf">
    <title>This is an H1</title>

    <sec>
      <title>This is an H2</title>

      <sec>
        <title>This is an H3</title>
        <p>This SHOULD be "Body_Text", but it's "Flush_Text"</p>
      </sec> <!-- end of H3 -->
    </sec> <!-- end of H2 -->
  </sec> <!-- end of H1 -->

  <sec>
    <p>This is Flush_Text</p>
  </sec>
    <p>This is Body_Text</p>
  </body>
</root>


...here is my XSL, which is not working correctly:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes" method="xml"/>
<xsl:strip-space elements="*"/>

    <!-- identity rule -->
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

        <!-- Body_Text -->
        <xsl:template match="body//p">
            <Body_Text>
                <xsl:apply-templates select="@*|node()"/>
            </Body_Text>
        </xsl:template>

        <!-- Flush_Text -->
        <xsl:template match="sec//p">
          <xsl:if test="not(@sec-type)">
            <Flush_Text>
                <xsl:apply-templates select="@*|node()"/>
            </Flush_Text>
          </xsl:if>
        </xsl:template>

        <!-- H1 -->
        <xsl:template match="sec//title">
            <H1>
                <xsl:apply-templates select="@*|node()"/>
            </H1>
        </xsl:template>

        <!-- H2 -->
        <xsl:template match="sec//sec//title">
            <H2>
                <xsl:apply-templates select="@*|node()"/>
            </H2>
        </xsl:template>

        <!-- H3 -->
        <xsl:template match="sec//sec//sec//title">
            <H3>
                <xsl:apply-templates select="@*|node()"/>
            </H3>
        </xsl:template>
</xsl:stylesheet>


...and here is the incorrect output:

<?xml version="1.0" encoding="utf-16"?>
<root>
    <body>
        <sec sec-type="asdf">
            <H1>This is an H1</H1>
            <sec>
                <H2>This is an H2</H2>
                <sec>
                    <H3>This is an H3</H3>
                    <Flush_Text>This SHOULD be "Body_Text", but it's "Flush_Text"</Flush_Text>
                </sec>
                <!-- end of H3 -->
            </sec>
            <!-- end of H2 -->
        </sec>
        <!-- end of H1 -->
        <sec>
            <Flush_Text>This is Flush_Text</Flush_Text>
        </sec>
        <Body_Text>This is Body_Text</Body_Text>
    </body>
</root>

Note that the first instance of <p> in this example should be transformed to Body_Text, but it is being transformed as Flush_Text.


Solution

  • Ok, so to produce the wanted results here, I have changed the statement <xsl:template match="sec//p"> (in the XSL under Flush_Text) to <xsl:template match="p[ancestor::sec[last()][not(@sec-type)]]">, and also removed the if statement.

    Here is the corrected XSL:

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes" method="xml"/>
    <xsl:strip-space elements="*"/>
    
        <!-- identity rule -->
        <xsl:template match="node()|@*">
            <xsl:copy>
                <xsl:apply-templates select="node()|@*"/>
            </xsl:copy>
        </xsl:template>
    
            <!-- Body_Text -->
            <xsl:template match="body//p">
                <Body_Text>
                    <xsl:apply-templates select="@*|node()"/>
                </Body_Text>
            </xsl:template>
    
        <!-- Flush_Text -->
        <xsl:template match="p[ancestor::sec[last()][not(@sec-type)]]">
            <Flush_Text>
                <xsl:apply-templates select="@*|node()"/>
            </Flush_Text>
        </xsl:template>
    
            <!-- H1 -->
            <xsl:template match="sec//title">
                <H1>
                    <xsl:apply-templates select="@*|node()"/>
                </H1>
            </xsl:template>
    
            <!-- H2 -->
            <xsl:template match="sec//sec//title">
                <H2>
                    <xsl:apply-templates select="@*|node()"/>
                </H2>
            </xsl:template>
    
            <!-- H3 -->
            <xsl:template match="sec//sec//sec//title">
                <H3>
                    <xsl:apply-templates select="@*|node()"/>
                </H3>
            </xsl:template>
    </xsl:stylesheet>
    

    ...producing this desired output:

    <root>
    <body>
    <sec sec-type="asdf">
    <H1>This is an H1</H1>
    <sec>
    <H2>This is an H2</H2>
    <sec>
    <H3>This is an H3</H3>
    <Body_Text>This SHOULD be "Body_Text", but it's "Flush_Text"</Body_Text>
    </sec>
    
    </sec>
    
    </sec>
    
    <sec>
    <Flush_Text>This is Flush_Text</Flush_Text>
    </sec>
    <Body_Text>This is Body_Text</Body_Text>
    </body>
    </root>
    


    this was tested at: http://xslt.online-toolz.com/tools/xslt-transformation.php.

    Thanks @Tomalak for pointing me in the right direction in the use of the ancestor xpath axis.

    Here I have matched the last ancestor (what I was incorrectly calling the "highest parent") <sec> from any <p> that does not have the attribute sec-type, and transformating that as Flush_Text. This is preventing the first instance of <p> in this example, that has <sec sec-type... as its' last ancestor, from being Flush_Text and allows the Body_Text to override.

    Also, I like Tomalak's use of automating H1 - H3... I am still experimenting with this, and don't want to use it until I fully understand it ;)