Search code examples
asp.netxmlvb.netxsltektron

XSLT - Using substring with copy-of to preserve inner HTML tags


I have some XML like this:

<story><p><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong>Nulla vel mauris metus. Etiam vel tortor vel magna bibendum euismod nec varius turpis. Nullam ullamcorper, nunc vel auctor consectetur, quam felis accumsan eros, lacinia fringilla mauris est vel lectus. Curabitur et tortor eros. Duis sed convallis metus. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Cras tempus quam sed enim gravida bibendum. Vestibulum magna ligula, varius in sodales eu, ultricies volutpat sem. Phasellus ante justo, vestibulum eu hendrerit a, posuere vitae est. Integer at pulvinar est.</p><p>Quisque a commodo eros. Integer tempus mi sit amet leo consectetur adipiscing. Nullam sit amet enim metus. Curabitur sollicitudin egestas arcu, at convallis enim iaculis eget. Etiam faucibus, justo sit amet lacinia consectetur, purus nunc rhoncus dui, id malesuada tortor est sed orci. Quisque eget nisi vitae mi facilisis varius. Integer fringilla eros sit amet velit vehicula commodo. </p><br /><span>And some more text here</span> </story>

I want to do this:

<xsl:copy-of select="substring(story/node(),1,500)"/>

Here is the problem. I lose the <p>, <strong>, <br /> and other HTML tags inside the <story> tag whenever I take the substring. Is there any way to get the first 500 characters of the story tag while keeping the inner HTML tags?

Thanks!


Solution

  • Here is another approach in XSLT 1.0, without having to use the node-set extension:

      <xsl:template match="@*|node()" mode="limit-length">
        <xsl:param name="length"/>
        <xsl:copy>
          <xsl:apply-templates select="@*" mode="limit-length"/>
          <xsl:call-template name="copy-nodes">
            <xsl:with-param name="nodes" select="node()"/>
            <xsl:with-param name="length" select="$length"/>
          </xsl:call-template>
        </xsl:copy>
      </xsl:template>
    
      <xsl:template match="text()" mode="limit-length">
        <xsl:param name="length"/>
        <xsl:value-of select="substring(., 1, $length)"/>
      </xsl:template>
    
      <xsl:template name="copy-nodes">
        <xsl:param name="nodes"/>
        <xsl:param name="length"/>
        <xsl:if test="$length &gt; 0 and $nodes">
          <xsl:variable name="head" select="$nodes[1]"/>
          <xsl:apply-templates select="$head" mode="limit-length">
            <xsl:with-param name="length" select="$length"/>
          </xsl:apply-templates>
          <xsl:variable name="remaining" select="$length - string-length($head)"/>
          <xsl:if test="$remaining &gt; 0 and count($nodes) &gt; 1">
            <xsl:call-template name="copy-nodes">
              <xsl:with-param name="nodes" select="$nodes[position() &gt; 1]"/>
              <xsl:with-param name="length" select="$remaining"/>
            </xsl:call-template>
          </xsl:if>
        </xsl:if>
      </xsl:template>
    

    Basically this is the identity template, with copying of the child nodes offloaded to a recursive template which takes care of keeping to the maximum string length, plus a separate template for text nodes, truncating them to the maximum length.

    You can invoke this for the sample input as follows:

    <xsl:call-template name="copy-nodes">
      <xsl:with-param name="nodes" select="story/node()"/>
      <xsl:with-param name="length" select="500"/>
    </xsl:call-template>
    

    Follow-up: Splitting the story

    For the follow up question of splitting the story into two pieces after the first break or paragraph end after N characters, I'll go ahead and make the simplifying assumption that you want to consider splitting only after <p> and <br> elements which appear as direct children under the <story> element (and not nested at an arbitrary depth). This makes the whole problem much easier.

    Here is one way to accomplish it: To get the contents of the first part, you could use a template which will process a set of sibling nodes until the maximum string length is exceeded and a br or p is encountered, and then stop.

      <xsl:template match="node()" mode="before-break">
        <xsl:param name="length"/>
        <xsl:if test="$length &gt; 0 or not(self::br or self::p)">
          <xsl:copy-of select="."/>
          <xsl:apply-templates select="following-sibling::node()[1]"
                               mode="before-break">
            <xsl:with-param name="length" select="$length - string-length(.)"/>
          </xsl:apply-templates>
        </xsl:if>
      </xsl:template>
    

    And for the second part, you could create another template which searches for the same condition as the previous template, but outputs nothing until after that point:

      <xsl:template match="node()" mode="after-break">
        <xsl:param name="length"/>
        <xsl:choose>
          <xsl:when test="$length &gt; 0 or not(self::br or self::p)">
            <xsl:apply-templates select="following-sibling::node()[1]"
                                 mode="after-break">
              <xsl:with-param name="length" select="$length - string-length(.)"/>
            </xsl:apply-templates>
          </xsl:when>
          <xsl:otherwise>
            <xsl:if test="not(self::br)"> <!-- suppress the <br/> -->
              <xsl:copy-of select="."/>
            </xsl:if>
            <xsl:copy-of select="following-sibling::node()"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:template>
    

    And here's how you can use those templates to split a story into two <div>s.

      <xsl:template match="story">
        <xsl:copy>
          <xsl:copy-of select="@*"/>
          <div>
            <xsl:apply-templates select="node()[1]" mode="before-break">
              <xsl:with-param name="length" select="500"/>
            </xsl:apply-templates>
          </div>
          <div>
            <xsl:apply-templates select="node()[1]" mode="after-break">
              <xsl:with-param name="length" select="500"/>
            </xsl:apply-templates>
          </div>
        </xsl:copy>
      </xsl:template>