Search code examples
xmlxslt-2.0

edit an xml file - search values taken from a file, replacement values taken from a second file. Search must be case insensitive


I have as source, something like below:

      <?xml version="1.0"?>
<TABLE NAME="TEST">
<DATA RECORDS="78">
<catalog>
   <book id="bk109">
      <description>An anthology of horror stories about roaches, centipedes, scorpions  and other insects.</description>
   </book>
   <book id="bk110">
      <description>Microsoft's .NET initiative is explored in detail in this deep programmer's reference.</description>
   </book>
   <book id="bk111">
      <description>An anthology of HORROR stories about roaches, centipedes, scorpions  and other insects.</description>
   </book>
   <book id="bk112">
      <description>An anthology of horror stories about roaches, centipedes, scorpions  and other insects.</description>
   </book>
   <book id="bk113">
      <description>An anthology of horror stories about roaches, centipedes, scorpions  and other insects.</description>
   </book>
   <book id="bk114">
      <description>Microsoft's .NET initiative is explored in detail in this deep PROGRAMMER's reference.</description>
   </book>
   <book id="bk115">
      <description>An anthology of HORROR stories about roaches, centipedes, scorpions  and other insects.</description>
   </book>
   <book id="bk116">
      <description>An anthology of horror stories about roaches, centipedes, scorpions  and other insects. Beware, this must not be matched.</description>
   </book>
   <book id="bk114">
      <description>Microsoft's .NET initiative is explored in detail in this deep PROGRAMMER's reference. Beware, this must not be matched.</description>
   </book>
</DATA>
</TABLE>

search.txt file contains:

An anthology of horror stories about roaches, centipedes, scorpions  and other insects.
Microsoft's .NET initiative is explored in detail in this deep programmer's reference.

replace.txt file contains:

Value we need to store in the (description) element.
Another value we need to store in the (description) element.

The search should be case insensitive,

so both

<description>An anthology of horror stories about roaches, centipedes, scorpions  and other insects.</description>

and

<description>An anthology of HORROR stories about roaches, centipedes, scorpions  and other insects.</description>

should be matched, and replaced, so the result xml should look like:

   <?xml version="1.0"?>
<TABLE NAME="TEST">
<DATA RECORDS="78">
<catalog>
   <book id="bk109">
      <description>Value we need to store in the (description) element.</description>
   </book>
   <book id="bk110">
      <description>Another value we need to store in the (description) element.</description>
   </book>
   <book id="bk111">
      <description>Value we need to store in the (description) element.</description>
   </book>
   <book id="bk112">
      <description>Value we need to store in the (description) element.</description>
   </book>
   <book id="bk113">
      <description>Value we need to store in the (description) element.</description>
   </book>
   <book id="bk114">
      <description>Another value we need to store in the (description) element.</description>
   </book>
   <book id="bk115">
      <description>Value we need to store in the (description) element.</description>
   </book>
   <book id="bk116">
      <description>An anthology of horror stories about roaches, centipedes, scorpions  and other insects. Beware, this must not be matched.</description>
   </book>
   <book id="bk114">
      <description>Microsoft's .NET initiative is explored in detail in this deep PROGRAMMER's reference. Beware, this must not be matched.</description>
   </book>
</DATA>
</TABLE>

i have tried with http://www.xqueryfunctions.com/ with no luck, and i have to mention, that special characters may be found in the search or replacement value, like a ., pareentheses(), etc, and an exact match (case insensitive ofcourse) is needed in order for the replacement to take place, please see the last description node, that is not replaced.

UPDATE: what i have tried, and does not work, if the replacement string is not a single word:

 <?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:functx="http://www.functx.com"
    exclude-result-prefixes="xs functx"
    version="2.0">

    <xsl:param name="search-file" as="xs:string" select="'search.txt'"/>
    <xsl:param name="replacement-file" as="xs:string" select="'replace.txt'"/>


    <xsl:param name="search-terms" as="xs:string*" select="tokenize(unparsed-text($search-file), '\r?\n')"/>

    <xsl:param name="search-terms-is" as="xs:string*" select="for $term in $search-terms return lower-case(functx:escape-for-regex($term))"/>

    <xsl:param name="replace-terms" as="xs:string*" select="tokenize(unparsed-text($replacement-file), '\r?\n')"/>

    <xsl:include href="http://www.xsltfunctions.com/xsl/functx-1.0-nodoc-2007-01.xsl"/>

    <xsl:function name="functx:replace-multi" as="xs:string?"
        xmlns:functx="http://www.functx.com">
        <xsl:param name="arg" as="xs:string?"/>
        <xsl:param name="changeFrom" as="xs:string*"/>
        <xsl:param name="changeTo" as="xs:string*"/>
        <xsl:param name="flags" as="xs:string"/>

        <xsl:sequence select="
            if (count($changeFrom) > 0)
            then functx:replace-multi(
            replace($arg, $changeFrom[1],
            functx:if-absent($changeTo[1],''), $flags),
            $changeFrom[position() > 1],
            $changeTo[position() > 1])
            else $arg
            "/>

    </xsl:function>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="description[some $search-term in $search-terms-is satisfies matches(current(), $search-term, 'i')]">
        <xsl:copy>
            <xsl:variable name="matched-terms" as="xs:string*" select="$search-terms-is[matches(current(), ., 'i')]"/>
            <xsl:variable name="replacements" as="xs:string*" select="for $t in $matched-terms return $replace-terms[position() = index-of($search-terms-is, $t)]"/>
            <xsl:value-of
                select="functx:replace-multi(., $matched-terms, $replacements, 'i')"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

Solution

  • Given your samples and explanation I think a single description element can only match once so I think your code will work fine with the following simplification:

    <xsl:param name="search-terms-is" as="xs:string*" select="for $term in $search-terms return concat('^', lower-case(functx:escape-for-regex($term)), '$')"/>
    
    <xsl:template match="description[some $search-term in $search-terms-is satisfies matches(., $search-term, 'i')]">
        <xsl:copy>
            <xsl:variable name="matched-term" as="xs:string" select="$search-terms-is[matches(current(), ., 'i')]"/>
            <xsl:variable name="replacement" as="xs:string" select="$replace-terms[index-of($search-terms-is, $matched-term)]"/>
            <xsl:value-of
                select="$replacement"/>
        </xsl:copy>
    </xsl:template>
    

    As for a more complete example, here is one minus the external files:

    <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        xmlns:functx="http://www.functx.com"
        exclude-result-prefixes="xs functx">
    
        <xsl:param name="search-text" as="xs:string">An anthology of horror stories about roaches, centipedes, scorpions  and other insects.
    Microsoft's .NET initiative is explored in detail in this deep programmer's reference.</xsl:param>
    
        <xsl:param name="replacement-text" as="xs:string">Value we need to store in the (description) element.
    Another value we need to store in the (description) element.</xsl:param>
    
        <xsl:param name="search-terms" as="xs:string*" select="tokenize($search-text, '\r?\n')"/>
    
        <xsl:param name="search-terms-is" as="xs:string*" select="for $term in $search-terms return concat('^', lower-case(functx:escape-for-regex($term)), '$')"/>
    
        <xsl:param name="replace-terms" as="xs:string*" select="tokenize($replacement-text, '\r?\n')"/>
    
        <xsl:include href="http://www.xsltfunctions.com/xsl/functx-1.0-nodoc-2007-01.xsl"/>
    
        <xsl:template match="@*|node()">
            <xsl:copy>
                <xsl:apply-templates select="@*|node()"/>
            </xsl:copy>
        </xsl:template>
    
    
    <xsl:template match="description[some $search-term in $search-terms-is satisfies matches(., $search-term, 'i')]">
        <xsl:copy>
            <xsl:variable name="matched-term" as="xs:string" select="$search-terms-is[matches(current(), ., 'i')]"/>
            <xsl:variable name="replacement" as="xs:string" select="$replace-terms[index-of($search-terms-is, $matched-term)]"/>
            <xsl:value-of
                select="$replacement"/>
        </xsl:copy>
    </xsl:template>
    
    </xsl:transform>
    

    Online at http://xsltransform.net/gVhD8RA.