Search code examples
xmlxslt-2.0

xml case insensitive replacements using external files and xsl


i have as input the following xml file, input.xml:

<?xml version="1.0" encoding="UTF-8"?>
<TABLE NAME="ITEMS.DB">
   <DATA RECORDS="33673">
      <RECORD ID="1">
         <ID>1</ID>
         <ROW>0</ROW>
         <DATE>19/9/2003 12:31:54 μμ</DATE>
         <al>29/6/2005 10:46:42 πμ</al>
         <KIT>46123</KIT>
         <KAP>08</KAP>
         <YTE>A.IV.C.54</YTE>
         <HTE>0</HTE>
         <HEN>0</HEN>
         <SUM>0</SUM>
         <LYW>0</LYW>
         <AMF>29</AMF>
         <MANUFACTURER>ASUS</MANUFACTURER>
         <AME>pan</AME>
      </RECORD>
      <RECORD ID="1">
         <ID>1</ID>
         <ROW>0</ROW>
         <DATE>19/9/2003 12:31:54 μμ</DATE>
         <al>29/6/2005 10:46:42 πμ</al>
         <KIT>46123</KIT>
         <KAP>08</KAP>
         <YTE>A.IV.C.54</YTE>
         <HTE>0</HTE>
         <HEN>0</HEN>
         <SUM>0</SUM>
         <LYW>0</LYW>
         <AMF>29</AMF>
         <MANUFACTURER>asus</MANUFACTURER>
         <AME>pan</AME>
      </RECORD>
      <RECORD ID="14">
         <ID>1</ID>
         <ROW>0</ROW>
         <DATE>19/9/2003 12:31:54 μμ</DATE>
         <al>29/6/2005 10:46:42 πμ</al>
         <KIT>46123</KIT>
         <KAP>08</KAP>
         <YTE>A.IV.C.54</YTE>
         <HTE>0</HTE>
         <HEN>0</HEN>
         <SUM>0</SUM>
         <LYW>0</LYW>
         <AMF>29</AMF>
         <MANUFACTURER>Creative Labs</MANUFACTURER>
         <AME>pan</AME>
      </RECORD>
      <RECORD ID="1">
         <ID>1</ID>
         <ROW>0</ROW>
         <DATE>19/9/2003 12:31:54 μμ</DATE>
         <al>29/6/2005 10:46:42 πμ</al>
         <KIT>46123</KIT>
         <KAP>08</KAP>
         <YTE>A.IV.C.54</YTE>
         <HTE>0</HTE>
         <HEN>0</HEN>
         <SUM>0</SUM>
         <LYW>0</LYW>
         <AMF>29</AMF>
         <MANUFACTURER>Creative labs</MANUFACTURER>
         <AME>pan</AME>
      </RECORD>
      <RECORD ID="14">
         <ID>1</ID>
         <ROW>0</ROW>
         <DATE>19/9/2003 12:31:54 μμ</DATE>
         <al>29/6/2005 10:46:42 πμ</al>
         <KIT>46123</KIT>
         <KAP>08</KAP>
         <YTE>A.IV.C.54</YTE>
         <HTE>0</HTE>
         <HEN>0</HEN>
         <SUM>0</SUM>
         <LYW>0</LYW>
         <AMF>29</AMF>
         <MANUFACTURER>This is a test. Replace (all)</MANUFACTURER>
         <AME>pan</AME>
      </RECORD>
   </DATA>
</TABLE>

Then file search.txt:

ASUS
creative labs
This is a test. Replace (all)

and replace.txt:

GIGABYTE
LOGITECH
REPLACEMENT

I am looking for a way using xslt-2.0, to make case insensitive replacements, for each value found in search.txt, with the corresponding replacement value, found in file replace.txt, so the result xml should be:

<?xml version="1.0" encoding="UTF-8"?>
<TABLE NAME="ITEMS.DB">
   <DATA RECORDS="33673">
      <RECORD ID="1">
         <ID>1</ID>
         <ROW>0</ROW>
         <DATE>19/9/2003 12:31:54 μμ</DATE>
         <al>29/6/2005 10:46:42 πμ</al>
         <KIT>46123</KIT>
         <KAP>08</KAP>
         <YTE>A.IV.C.54</YTE>
         <HTE>0</HTE>
         <HEN>0</HEN>
         <SUM>0</SUM>
         <LYW>0</LYW>
         <AMF>29</AMF>
         <MANUFACTURER>GIGABYTE</MANUFACTURER>
         <AME>pan</AME>
      </RECORD>
      <RECORD ID="1">
         <ID>1</ID>
         <ROW>0</ROW>
         <DATE>19/9/2003 12:31:54 μμ</DATE>
         <al>29/6/2005 10:46:42 πμ</al>
         <KIT>46123</KIT>
         <KAP>08</KAP>
         <YTE>A.IV.C.54</YTE>
         <HTE>0</HTE>
         <HEN>0</HEN>
         <SUM>0</SUM>
         <LYW>0</LYW>
         <AMF>29</AMF>
         <MANUFACTURER>GIGABYTE</MANUFACTURER>
         <AME>pan</AME>
      </RECORD>
      <RECORD ID="1">
         <ID>1</ID>
         <ROW>0</ROW>
         <DATE>19/9/2003 12:31:54 μμ</DATE>
         <al>29/6/2005 10:46:42 πμ</al>
         <KIT>46123</KIT>
         <KAP>08</KAP>
         <YTE>A.IV.C.54</YTE>
         <HTE>0</HTE>
         <HEN>0</HEN>
         <SUM>0</SUM>
         <LYW>0</LYW>
         <AMF>29</AMF>
         <MANUFACTURER>LOGITECH</MANUFACTURER>
         <AME>pan</AME>
      </RECORD>
      <RECORD ID="1">
         <ID>1</ID>
         <ROW>0</ROW>
         <DATE>19/9/2003 12:31:54 μμ</DATE>
         <al>29/6/2005 10:46:42 πμ</al>
         <KIT>46123</KIT>
         <KAP>08</KAP>
         <YTE>A.IV.C.54</YTE>
         <HTE>0</HTE>
         <HEN>0</HEN>
         <SUM>0</SUM>
         <LYW>0</LYW>
         <AMF>29</AMF>
         <MANUFACTURER>LOGITECH</MANUFACTURER>
         <AME>pan</AME>
      </RECORD>
      <RECORD ID="1">
         <ID>1</ID>
         <ROW>0</ROW>
         <DATE>19/9/2003 12:31:54 μμ</DATE>
         <al>29/6/2005 10:46:42 πμ</al>
         <KIT>46123</KIT>
         <KAP>08</KAP>
         <YTE>A.IV.C.54</YTE>
         <HTE>0</HTE>
         <HEN>0</HEN>
         <SUM>0</SUM>
         <LYW>0</LYW>
         <AMF>29</AMF>
         <MANUFACTURER>REPLACEMENT</MANUFACTURER>
         <AME>pan</AME>
      </RECORD>
   </DATA>
</TABLE>

The algorithm for replacements goes like this: whatever we find in row 1 of search.txt, must be replaced with what is found in row 1 of file replace.txt, and the xsl command, should accept as parameters the filenames, of search.txt, and replace.txt, found in the same directory with the xml, and xsl file. Only one MANUFACTURER element, may be found in each record. Apart from the . character, in the replacements, we may have also other special characters


Solution

  • So assuming you have the search and replacement terms in each line of a text file and you expect there to be terms containing characters that need to be escaped in regular expressions I have now written the following stylesheet making use of the functx library:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        xmlns:functx="http://www.functx.com"
        exclude-result-prefixes="xs functx"
        version="2.0">
    
        <xsl:param name="search-file" as="xs:string" select="'search.txt'"/>
        <xsl:param name="replacement-file" as="xs:string" select="'replacement.txt'"/>
    
    
        <xsl:param name="search-terms" as="xs:string*" select="tokenize(unparsed-text($search-file), '\r?\n')"/>
    
        <xsl:param name="search-terms-is" as="xs:string*" select="for $term in $search-terms return lower-case(functx:escape-for-regex($term))"/>
    
        <xsl:param name="replace-terms" as="xs:string*" select="tokenize(unparsed-text($replacement-file), '\r?\n')"/>
    
        <xsl:include href="http://www.xsltfunctions.com/xsl/functx-1.0-nodoc-2007-01.xsl"/>
    
        <xsl:template match="@* | node()">
            <xsl:copy>
                <xsl:apply-templates select="@* | node()"/>
            </xsl:copy>
        </xsl:template>
    
        <xsl:template match="MANUFACTURER[$search-terms-is[matches(current(), ., 'i')]]">
            <xsl:copy>
                <xsl:variable name="matched-term" as="xs:string" select="$search-terms-is[matches(current(), ., 'i')]"/>
                <xsl:value-of
                    select="replace(., $matched-term, $replace-terms[index-of($search-terms-is, $matched-term)], 'i')"/>
            </xsl:copy>
        </xsl:template>
    
    </xsl:stylesheet>
    

    That stylesheet works for me with your edited input snippets, I don't get any errors and the contents of the matched MANUFACTURER elements is replaced.