Search code examples
xmlxsltxslt-2.0tei

Sorting several XML files along dates and merging it into one with XSLT


I have several single XML-files containing historic letters in TEI. Now I want to merge them into one single file with the date as the criteria.

A1.xml

<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:id="1">
<teiHeader>
    <title>Letter 1</title>
    <date when="19990202" n="0"></date>
</teiHeader>
<text>
        <p>Content of letter 1</p>
</text>
</TEI>

and a second file, A2.xml:

<?xml version="1.0" encoding="UTF-8"?>
    <TEI xml:id="2">
    <teiHeader>
        <title>Letter 1</title>
        <date when="20010202" n="0"></date>
    </teiHeader>
    <text>
            <p>Content of letter 2</p>
    </text>
    </TEI>

and a third one, A3.xml:

<?xml version="1.0" encoding="UTF-8"?>
    <TEI xml:id="3">
    <teiHeader>
        <title>Letter 3</title>
        <date when="18880101" n="0"></date>
    </teiHeader>
    <text>
            <p>Content of letter 3</p>
    </text>
    </TEI>

The files are named in consecutive file names "A001.xml" to "A999.xml", but not in the desired order. So my prefered output would be a single file letters.xml:

<?xml version="1.0" encoding="UTF-8"?>
<CORRESPONDENCE>

<TEI xml:id="3">
        <teiHeader>
            <title>Letter 3</title>
            <date when="18880101" n="0"></date>
        </teiHeader>
        <text>
                <p>Content of letter 3</p>
        </text>
        </TEI>

    <TEI xml:id="1">
    <teiHeader>
        <title>Letter 1</title>
        <date when="19990202" n="0"></date>
    </teiHeader>
    <text>
            <p>Content of letter 1</p>
    </text>
    </TEI>
        <TEI xml:id="2">
        <teiHeader>
            <title>Letter 1</title>
            <date when="20010202" n="0"></date>
        </teiHeader>
        <text>
                <p>Content of letter 2</p>
        </text>
        </TEI>
</CORRESPONDENCE>

Even though I find ways of merging several XML files into one, I don't manage to get it to work using the sorting criteria. Is this even possible?


Solution

  • As you simply want to concatenate the XML documents with Saxon 9 and XSLT 2.0 it is as easy as

    <xsl:stylesheet
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
      exclude-result-prefixes="xs"
      version="2.0">
    
    <xsl:param name="file-suffix" as="xs:string" select="'A*.xml'"/>
    
    <xsl:template match="/" name="main">
      <CORRESPONDENCE>
        <xsl:perform-sort select="collection(concat('.?select=', $file-suffix))/*">
          <xsl:sort select="teiHeader/date/xs:integer(@when)"/>
        </xsl:perform-sort>
      </CORRESPONDENCE>
    </xsl:template>
    
    </xsl:stylesheet>
    

    You would run that with command line options -it:main -xsl:stylesheet.xsl or if needed with a primary input document, but the documents to be processed would simply be fetched in using the collection as shown.

    If the elements in your input samples are in the namespace http://www.tei-c.org/ns/1.0, as Abel commented, then you would need to change the code to

    <xsl:stylesheet
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
      xpath-default-namespace="http://www.tei-c.org/ns/1.0"
      exclude-result-prefixes="xs"
      version="2.0">
    
    <xsl:param name="file-suffix" as="xs:string" select="'A*.xml'"/>
    
    <xsl:template match="/" name="main">
      <CORRESPONDENCE>
        <xsl:perform-sort select="collection(concat('.?select=', $file-suffix))/*">
          <xsl:sort select="teiHeader/date/xs:integer(@when)"/>
        </xsl:perform-sort>
      </CORRESPONDENCE>
    </xsl:template>
    
    </xsl:stylesheet>