Search code examples
htmlxmlxslthtml-parsing

Xslt get text between strings with different rules for different languages


I'm parsing Html page using xslt. On page there are one peace of html from which I want to receive publisher name

<div>
    <span class="publisher_name">by xxx from TripAdvisor</span>
</div>

To parse it I use next code:

<xsl:variable name="publisherTextNode" select=".//span[@class='publisher_name'][1]"/>
    <xsl:if test="$publisherTextNode">
        <Publisher>
            <xsl:call-template name="string-trim">
                <xsl:with-param name="string" select="substring-before(substring-after($publisherTextNode, 'by'), 'from')" />
            </xsl:call-template>
         </Publisher>
    </xsl:if>

So it should select text between by and from. As result it should be xxx

But here is problem for language that isn't English.

In case of Spanish html looks like

<span class="publisher_name">por xxx de TripAdvisor</span>

and xslt returns string.Empty cause it cannot find by string.

So I want to add similar rule to support Spanish string also like

<xsl:with-param name="string" select="substring-before(substring-after($publisherTextNode, 'por'), 'de')" />

Can I somehow add this 2 rules to existing xslt schema (maybe check if first rule returns string.Empty then use 2nd rule?) or create separate one for different languages?


  <xsl:template name="string-trim">
<xsl:param name="string" />
<xsl:param name="trim" select="$whitespace" />
<xsl:call-template name="string-rtrim">
  <xsl:with-param name="string">
    <xsl:call-template name="string-ltrim">
      <xsl:with-param name="string" select="$string" />
      <xsl:with-param name="trim"   select="$trim" />
    </xsl:call-template>
  </xsl:with-param>
  <xsl:with-param name="trim"   select="$trim" />
</xsl:call-template>

<xsl:template name="string-ltrim">
<xsl:param name="string" />
<xsl:param name="trim" select="$whitespace" />

<xsl:if test="string-length($string) &gt; 0">
  <xsl:choose>
    <xsl:when test="contains($trim, substring($string, 1, 1))">
      <xsl:call-template name="string-ltrim">
        <xsl:with-param name="string" select="substring($string, 2)" />
        <xsl:with-param name="trim"   select="$trim" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$string" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:if>

  <xsl:template name="string-rtrim">
<xsl:param name="string" />
<xsl:param name="trim" select="$whitespace" />

<xsl:variable name="length" select="string-length($string)" />

<xsl:if test="$length &gt; 0">
  <xsl:choose>
    <xsl:when test="contains($trim, substring($string, $length, 1))">
      <xsl:call-template name="string-rtrim">
        <xsl:with-param name="string" select="substring($string, 1, $length - 1)" />
        <xsl:with-param name="trim"   select="$trim" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$string" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:if>


Solution

  • How about something like (XSLT 1.0):

    <xsl:call-template name="string-trim">
        <xsl:with-param name="string">
            <xsl:choose>
                <xsl:when test="contains($publisherTextNode, 'by ') and contains($publisherTextNode, ' from')">
                    <xsl:value-of select="substring-before(substring-after($publisherTextNode, 'by '), ' from')" />
                </xsl:when>
                <xsl:when test="contains($publisherTextNode, 'por ') and contains($publisherTextNode, ' de')">
                    <xsl:value-of select="substring-before(substring-after($publisherTextNode, 'por '), ' de')" />
                </xsl:when>
            </xsl:choose>
        </xsl:with-param>
    </xsl:call-template>
    

    Note that there is a slight chance of getting a false positive on the tests.