Search code examples
xmlxsltattributesxslt-2.0transformation

incorporate values from one xml file into another.


I have the following xml files (let's call them paragraph.xml and sentence.xml).

Paragraph.xml

<?xml version="1.0" encoding="UTF-8"?>
<paragraphs>
     <paragraph id="par_1" parBegin="1" parEnd="100" par_type="intro" context="positive"/>
     <paragraph id="par_2" parBegin="101" parEnd="170" par_type="elaboration" context="negative"/>
     <paragraph id="par_3" parBegin="171" parEnd="210" par_type="elaboration" context="positive"/>
     <paragraph id="par_4" parBegin="211" parEnd="280" par_type="conclusion" context="neutral"/>

In paragraph.xml file, the attribute "parBegin" shows the word-number with which the paragraph starts and "parEnd" shows the number of the word where paragraph ends. As an example, the first paragraph element starts with word 1 (the value of parBegin attribute) and ends with word 100 (the first paragraph has 100 words).

The other xml file sentence.xml has information about the sentences of the same text.

<?xml version="1.0" encoding="UTF-8"?>
 <sentences>
     <sentence id="sent_1" sentBegin="1" sentEnd="15" sent_type="question"/>
     <sentence id="sent_2" sentBegin="16" sentEnd="30" sent_type="imperative"/>
     <sentence id="sent_3" sentBegin="31" sentEnd="37" sent_type="confirmation"/>
     ...
     <sentence id="sent_15" sentBegin="120" sentEnd="125" sent_type="conclusion" />

In sentence.xml file, the attribute "sentBegin" shows the word-number with which the sentence starts and "sentEnd" shows the number of the word where paragraph ends. As en example, the first sentence element starts with word 1 (the value of sentBegin attribute) and ends with word 15. The sentence with id="sent_15" starts with word 120 (sentBegin="120") and ends in word 125 (sentEnd="125").

What I want to do is to check to which paragraph each sentence belongs. In another word, to compare the value of the attribute @sentEnd with the value of the attribute @parEnd. If @sentEnd is bigger than @parBegin and smaller than @parEnd of a paragraph element, it shows that the sentence belongs to that paragraph. As an example, the sentEnd value of the sentence (id="sent_15") is 125 (sentEnd="125") which is bigger than the @parBegin (parBegin="101") value of paragraph with id="par_2" and smaller than its @parEnd (parEnd="170") values. This shows that sentence id="sent_15" belongs to paragraph id="par_2". The desired output looks like this:

<?xml version="1.0" encoding="UTF-8"?>
 <sentences>
     <sentence id="sent_1" sentBegin="1" sentEnd="15" sent_type="question" paragraph="par_1" par_type="intro"/>
     <sentence id="sent_2" sentBegin="16" sentEnd="30" sent_type="imperative" paragraph="par_1" par_type="intro"/>
     <sentence id="sent_3" sentBegin="31" sentEnd="37" sent_type="confirmation" paragraph="par_1" par_type="intro"/>
     ...
     <sentence id="sent_15" sentBegin="120" sentEnd="125" sent_type="conclusion" paragraph="par_2" par_type="elaboration" />

Thanks a lot for your feedback/solution.


Solution

  • It looks like you can simply select the right paragraph with a predicate:

      <xsl:template match="sentence">
          <xsl:copy>
              <xsl:apply-templates 
                 select="@*, 
                         $paragraph-doc/paragraphs/paragraph[xs:integer(@parBegin) &lt;= xs:integer(current()/@sentBegin) and xs:integer(@parEnd) >= xs:integer(current()/@sentEnd)]/(@id, @par_type)"/>
          </xsl:copy>
      </xsl:template>
    

    In the following I have inlined the paragraph document in a parameter but you could of course load it instead using the doc function:

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        exclude-result-prefixes="#all"
        version="3.0">
    
      <xsl:param name="paragraph-doc">
        <paragraphs>
             <paragraph id="par_1" parBegin="1" parEnd="100" par_type="intro" context="positive"/>
             <paragraph id="par_2" parBegin="101" parEnd="170" par_type="elaboration" context="negative"/>
             <paragraph id="par_3" parBegin="171" parEnd="210" par_type="elaboration" context="positive"/>
             <paragraph id="par_4" parBegin="211" parEnd="280" par_type="conclusion" context="neutral"/>
        </paragraphs>
      </xsl:param>
    
      <xsl:mode on-no-match="shallow-copy"/>
    
      <xsl:template match="sentence">
          <xsl:copy>
              <xsl:apply-templates 
                 select="@*, 
                         $paragraph-doc/paragraphs/paragraph[xs:integer(@parBegin) &lt;= xs:integer(current()/@sentBegin) and xs:integer(@parEnd) >= xs:integer(current()/@sentEnd)]/(@id, @par_type)"/>
          </xsl:copy>
      </xsl:template>
    
      <xsl:template match="paragraph/@id">
          <xsl:attribute name="paragraph" select="."/>
      </xsl:template>
    
    </xsl:stylesheet>
    

    https://xsltfiddle.liberty-development.net/nc4NzQZ is an XSLT 3 sample, for XSLT 2 you would need to replace the used xsl:mode declaration with the identity transformation template.

    As a refinement or alternative to the above we could key the paragraph elements on @parBegin to @parEnd and then use that key to find the relevant paragraph from a sentence:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        exclude-result-prefixes="#all"
        version="3.0">
    
      <xsl:param name="paragraph-doc">
        <paragraphs>
             <paragraph id="par_1" parBegin="1" parEnd="100" par_type="intro" context="positive"/>
             <paragraph id="par_2" parBegin="101" parEnd="170" par_type="elaboration" context="negative"/>
             <paragraph id="par_3" parBegin="171" parEnd="210" par_type="elaboration" context="positive"/>
             <paragraph id="par_4" parBegin="211" parEnd="280" par_type="conclusion" context="neutral"/>
        </paragraphs>
      </xsl:param>
    
      <xsl:key name="par-ref" match="paragraph" use="@parBegin to @parEnd"/>
    
      <xsl:mode on-no-match="shallow-copy"/>
    
      <xsl:template match="sentence">
          <xsl:copy>
              <xsl:apply-templates 
                 select="@*, 
                         key('par-ref', xs:integer(@sentEnd), $paragraph-doc)/(@id, @par_type)"/>
          </xsl:copy>
      </xsl:template>
    
      <xsl:template match="paragraph/@id">
          <xsl:attribute name="paragraph" select="."/>
      </xsl:template>
    
    </xsl:stylesheet>
    

    https://xsltfiddle.liberty-development.net/nc4NzQZ/2