Search code examples
xmlxsltxslt-2.0

XSL match nodes based on ID attributes


I think this is a simple question that I'm not wording properly, but after a few hours I'm stuck.

I have an XML like this:

<NORMDOC>
    <DOC>
        <TXT>
        <S sid="112233-SENT-001">
            <ENAMEX type="PERSON" id="PER-112233-001">George Washington</ENAMEX> and 
            <ENAMEX type="PERSON" id="PER-112233-002">Thomas Jefferson</ENAMEX> were both founding fathers.
        </S>
        <S sid="112233-SENT-002">
            <ENAMEX type="PERSON" id="PER-112233-002">Thomas Jefferson</ENAMEX> 
            has a social security number of <IDEX type="SSN" id="SSN-112233-075">222-22-2222</IDEX>.
        </S>
      </TXT>
   </DOC>
   <ENTINFO ID="PER-112233-002"
            TYPE="PERSON"
            NORM="Jefferson, Thomas"
            REFID="PER-112233-002"
            MENTION="Thomas Jefferson"
            GIVEN="Thomas"
            MIDDLE=""
            SURNAME="Jefferson"/>
</NORMDOC>

And I am trying to combine the contents of the ENTINFO and S tags by matching their ID and id attributes.

Desired output:

<ENTINFO>
    <ENTINFO_PERSON_NORM>Jefferson, Thomas</ENTINFO_PERSON_NORM>
    <ENTINFO_PERSON_MENTION>Thomas Jefferson</ENTINFO_PERSON_MENTION>
    <ENTINFO_PERSON_GIVEN>Thomas</ENTINFO_PERSON_GIVEN>
    <ENTINFO_PERSON_MIDDLE/>
    <ENTINFO_PERSON_SURNAME>Jefferson</ENTINFO_PERSON_SURNAME>
    <ENTINFO_SSN_NORM>222222222</ENTINFO_SSN_NORM>
    <ENTINFO_SSN_MENTION>social security number of 222-22-2222</ENTINFO_SSN_MENTION>
</ENTINFO>

The part I am having difficulty with is referring to the ID of the S element, using it as a comparison and pulling the data out from the S element when it matches.

Here is my XSLT:

<xsl:template match="ENTINFO">
    <xsl:copy>
        <!-- For each ENTINFO attribute, create a new ENTINFO element and append the attribute --> 
        <!-- name to the end of the element name ie ENTINFO ID=myid becomes <ENTINFO_ID>myid</ENTINFO_ID> -->
        <xsl:for-each select="@*">
            <xsl:element name="ENTINFO_{translate(name(), '-', '_')}">
                <xsl:value-of select="." />
            </xsl:element>
        </xsl:for-each>
        <!-- This code does not match anything so Mr. Jefferson's SSN never gets pulled in -->
        <xsl:if test="NORMDOC/DOC/TXT/S/IDEX[@id]=@ID">
            <xsl:for-each select="NORMDOC/DOC/TXT/S[@*]">
                <xsl:element name="ENTINFO_{translate(name(), '-', '_')}">
                    <xsl:value-of select="." />
                </xsl:element>
            </xsl:for-each>
        </xsl:if>
    </xsl:copy>
</xsl:template>

The first chunk of code works, and I get the appended ENTINFO tags that I wanted, but the SSN isn't getting matched and pulled correctly from the IDEX element. The second block of code has no effect.

Here is the actual output (I'm only concerned with the ENTINFO, will deal with the other output later:

<NORMDOC>
   <DOC>
      <RAW_TXT>George Washington and Thomas Jefferson were both founding fathers.Thomas Jefferson has a social security number of 222-22-2222.</RAW_TXT>
      <TXT>
         <S>
            <ENAMEX_PERSON>George Washington</ENAMEX_PERSON>
            <ENAMEX_PERSON>Thomas Jefferson</ENAMEX_PERSON>
         </S>
         <S>
            <ENAMEX_PERSON>Thomas Jefferson</ENAMEX_PERSON>
            <IDEX_SSN>222-22-2222</IDEX_SSN>
         </S>
      </TXT>
   </DOC>
   <ENTITIES>
      <ENTINFO>
         <ENTINFO_ID>PER-112233-002</ENTINFO_ID>
         <ENTINFO_TYPE>PERSON</ENTINFO_TYPE>
         <ENTINFO_NORM>Jefferson, Thomas</ENTINFO_NORM>
         <ENTINFO_REFID>PER-112233-002</ENTINFO_REFID>
         <ENTINFO_MENTION>Thomas Jefferson</ENTINFO_MENTION>
         <ENTINFO_GIVEN>Thomas</ENTINFO_GIVEN>
         <ENTINFO_MIDDLE/>
         <ENTINFO_SURNAME>Jefferson</ENTINFO_SURNAME>
      </ENTINFO>
   </ENTITIES>
</NORMDOC>

Solution

  • Cross-references are best resolved using a key. I couldn't understand the logic of your expected output - see if the attached can get you started:

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:key name="k" match="S" use="ENAMEX/@id" />
    
    <xsl:template match="/NORMDOC">
        <root>
            <xsl:for-each select="ENTINFO">
                <xsl:copy>
                    <!-- attributes to elements -->
                    <xsl:for-each select="@*">
                        <xsl:element name="ENTINFO_{translate(name(), '-', '_')}">
                            <xsl:value-of select="." />
                        </xsl:element>
                    </xsl:for-each>
                    <!-- mentions by ID -->
                    <xsl:for-each select="key('k', @ID)">
                        <ENTINFO_SSN_MENTION>
                            <xsl:value-of select="." />
                        </ENTINFO_SSN_MENTION>
                    </xsl:for-each>
                </xsl:copy>
            </xsl:for-each>
        </root>
    </xsl:template>
    
    </xsl:stylesheet>
    

    Demo: https://xsltfiddle.liberty-development.net/3NSSEux