I have large XML files and need to inject processing instructions into them. The locations for the processing instructions are listed as xpath locations. I created a small version of the XML file to give a complete example here.
This is a small sample XML file:
<?xml version="1.0" encoding="UTF-8"?>
<ACT>
<TITLE>
<P>Some title</P>
</TITLE>
<LIST>
<ITEM>
<LABEL>Article 1</LABEL>
<P>Dummy text for article 1.</P>
</ITEM>
<ITEM>
<LABEL>Article 2</LABEL>
<P>Dummy text for article 2.</P>
<LIST>
<ITEM>
<LABEL>Article a</LABEL>
<P>Dummy text for article a.</P>
</ITEM>
<ITEM>
<LABEL>Article b</LABEL>
<P>Dummy text for article b.</P>
</ITEM>
<ITEM>
<LABEL>Article c</LABEL>
<P>Dummy text for article c.</P>
</ITEM>
</LIST>
</ITEM>
<ITEM>
<LABEL>Article 3</LABEL>
<P>Dummy text for article 3.</P>
</ITEM>
</LIST>
The file with xpath locations looks something like this:
<?xml version="1.0" encoding="UTF-8"?>
<report>
<page_break>
<description>NO 1</description>
<xpath_location>/ACT[1]/LIST[1]/ITEM[1]/P[1]</xpath_location>
</page_break>
<page_break>
<description>NO 2</description>
<xpath_location>/ACT[1]/LIST[1]/ITEM[2]/LIST[1]/ITEM[3]/P[1]</xpath_location>
</page_break>
</report>
I tried to use the XSLT 3 xpath() function to match the position of the current element against the xpath_location elements in my PageBreaks.xml file, but I am not getting any matches. When I put the xpath() of the current element into a message it includes the Q{} in every branch. But when I added those into the file with target locations that did not give any results, either.
Here is the XSL I have tried:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="3.0">
<xsl:output method="xml" encoding="UTF-8"/>
<xsl:variable name="breaks" as="node()" select="document('Breaks.xml')/report"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="@*|text()">
<xsl:copy>
<xsl:apply-templates select="@*, node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*">
<xsl:variable name="mypath">
<xsl:value-of select="./path()"/>
</xsl:variable>
<xsl:variable name="pi_insert">
<xsl:if test="$breaks//xpath_location[. eq $mypath]">yes</xsl:if>
</xsl:variable>
<xsl:copy>
<xsl:apply-templates select="@*, node()"/>
</xsl:copy>
<xsl:if test="$pi_insert eq 'yes'">
<xsl:processing-instruction name="PAGE">
<xsl:value-of select="$breaks//xpath_location[. eq $mypath]/preceding-sibling::description"/>
</xsl:processing-instruction>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Doing a string match on the result of path()
seems not very robust: it only takes a very slight variation in the way the path is written in the XML document (for example, some whitespace) and then it won't match. I think it would be more reliable to use xsl:evaluate
.
Start by building a map from selected nodes to their descriptions:
<xsl:variable name="map" as="map(xs:string, xs:string)">
<xsl:variable name="root" select="."/>
<xsl:map>
<xsl:for-each select="$breaks//page-break">
<xsl:variable name="selectedNode" as="element(*)">
<xsl:evaluate xpath="xpath_location"
context-item="$root"/>
</xsl:variable>
<xsl:map-entry key="{generate-id($selectedNode)}" select="description"/>
</xsl:for-each>
</xsl:map>
</xsl:variable>
and then use this map to expand the relevant nodes:
<xsl:template match="*[exists($map(generate-id(.)))]">
<xsl:processing-instruction name="PAGE">
<xsl:value-of select="$map(generate-id(.))"/>
</xsl:processing-instruction>
</xsl:template>