Search code examples
xslt-1.0

Filter against multiple strings for an element


I have data like this:

<?xml version="1.0" encoding="utf-8"?>
<TRANSFER xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.interlis.ch/INTERLIS2.3" xmlns="http://www.interlis.ch/INTERLIS2.3">
  <HEADERSECTION VERSION="2.3" SENDER="Holenstein Ingenieure AG">
    <MODELS>
      <MODEL NAME="SIA405_WASSER_2015_LV95" VERSION="05.10.2018" URI="http://www.sia.ch/405" />
    </MODELS>
  </HEADERSECTION>
  <DATASECTION>
    <SIA405_WASSER_2015_LV95.SIA405_Wasser BID="ch152ck800000002">
      <SIA405_WASSER_2015_LV95.SIA405_Wasser.Leitung TID="ch17fnufWDNTSs6C">
        <OBJ_ID>ch17fnufWDNTSs6C</OBJ_ID>
      </SIA405_WASSER_2015_LV95.SIA405_Wasser.Leitung>
      <SIA405_WASSER_2015_LV95.SIA405_Wasser.Absperrorgan TID="ch17fnuf3f5P2bPQ">
        <OBJ_ID>ch17fnuf3f5P2bPQ</OBJ_ID>
        <SymbolOri>115.1</SymbolOri>
        <Lagebestimmung>genau</Lagebestimmung>
      </SIA405_WASSER_2015_LV95.SIA405_Wasser.Absperrorgan>
    </SIA405_WASSER_2015_LV95.SIA405_Wasser>
  </DATASECTION>
</TRANSFER>

In DATASECTION, there are nodes with different names and different structure, but every has the element OBJ_ID. In reality, the file has 20000 - 40000 nodes with unique OBJ_ID. On the other side, I have a list (text file) with hundreds of OBJ_IDs. The tasks are now:

  • a) create a new XML with all the nodes where the OBJ_ID is defined in the list
  • b) create a new XML where all the nodes on the list are removed (not copied)

based on the thread here How to compare against multiple strings in xslt I tried this, but with no success:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:ili="http://www.interlis.ch/INTERLIS2.3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
  <!-- externally-specified parameter -->
  <xsl:param name="pNames">
    <n>ch17fnufWDNTUJxA</n>
    <n>ch17fnufWDNTUJto</n>
    <n>ch17fnufWDNTUJoN</n>
  </xsl:param>
  <xsl:template match="ili:DATASECTION//node()">
    <xsl:if test="@OBJ_ID = $pNames/*">
      <xsl:copy>
        <xsl:apply-templates />
      </xsl:copy>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

The main target is to find a solution where a can find the package of hundreds OBJ_ID in a small way and to avoid to repeat simple match / select / if again and again. Thanks again in advance for your help!


Solution

  • First thing, I would suggest you keep the list of IDs in a separate XML document and pass only the path to the document as the parameter (or hard-code it into the stylesheet). This way you will be dealing with a node-set instead of a string, which will make things much easier.

    Now, in order to satisfy your requirement of:

    when parent is remove also removed children; when child is to be kept then keep the parents

    you could try and do something like:

    XSLT 1.0

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:ili="http://www.interlis.ch/INTERLIS2.3" 
    exclude-result-prefixes="ili">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:param name="ids" select="document('ids.xml')/root/n"/>
    
    <!-- identity transform -->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    
    <xsl:template match="*[ili:OBJ_ID]">  
        <xsl:if test="descendant-or-self::ili:OBJ_ID = $ids">
            <xsl:copy>
                <xsl:apply-templates select="@*|node()"/>
            </xsl:copy>
        </xsl:if>
    </xsl:template> 
    
    </xsl:stylesheet>
    

    where the external ids.xml file has a structure of:

    <root>
        <n>ch17fnufWDNTUJxA</n>
        <n>ch17fnufWDNTUJto</n>
        ...
    </root>
    

    Caveats:

    • Not tested very thoroughly;
    • With hundreds of IDs in the list and 20000 - 40000 elements to process, this is not likely to be fast.