I'm working through my first major XSLT project, and am a bit of a novice, so please be patient with my ignorance.
Our group is working on a transform of existing XMLs to an entirely different tagging system. I have devised a system of processing the MathType callouts (signified by "${TEXT}" ) using Analyze-String, but I'm having difficulty determining what I should do with code like the ital tags (signified by the "I" tags), which need to be kept in the result code.
I tried using copy-of in the non-matching-substring, but that appears to not work. Of course, value-of gets me everything except the ital tags.
I realize the variable ($stemString) is superfluous at this point. I was going along that path thinking I might be able to come up with something that would allow copy-of to process, but so far, no luck.
Sample Code:
<stem>What is the value of <I>f</I>(<I>x</I>) when ${##A112800eqn01:3}</stem>
My current XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="assessmentItem">
<!--SNIP-->
<xsl:apply-templates select="stemArea/stem"/>
<!--SNIP-->
</xsl:template>
<xsl:template match="stem">
<xsl:variable name="stemString">
<xsl:copy-of select="./* | ./text()"/>
</xsl:variable>
<xsl:choose>
<!--Tests for empty stems that aren't art callouts-->
<xsl:when test=". = '' and @type!='art'"></xsl:when>
<xsl:when test=". = ' ' and @type!='art'"></xsl:when>
<!--Test for art callouts-->
<xsl:when test="@type='art'"><p><img alt="{@loc}" height="10" id="{@loc}" label="" longdesc="normal" src="{@loc}" width="10"/></p></xsl:when>
<!--Test for boxed text-->
<xsl:when test="@style='box' or @style='boxL'"><p><span label="Tag_7">
<xsl:copy-of select="./* | ./text()"></xsl:copy-of>
</span></p></xsl:when>
<xsl:otherwise><p>
<!--Are MathType tokens present in stem?-->
<xsl:analyze-string regex="(\$\{{.+\}})" select="$stemString">
<!--If MathType tokens are in stem, do the following-->
<xsl:matching-substring>
<xsl:analyze-string regex="(\$\{{)(##.+[eqn|art]\d+)([^a-zA-Z0-9]?.*\}})" select=".">
<xsl:matching-substring>
<img alt="{regex-group(2)}" height="10" id="{regex-group(2)}" label="" longdesc="normal" src="{regex-group(2)}" width="10"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:text>ERROR</xsl:text>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:matching-substring>
<!--No MathType tokens in string-->
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</p></xsl:otherwise>
</xsl:choose>
</xsl:template>
Desired Output:
<p>What is the value of <I>f</I>(<I>x</I>) when <img alt="##A112800eqn01" height="10" id="##A112800eqn01" label="" longdesc="normal" src="##A112800eqn01" width="10"/></p>
What I'm getting:
<p>What is the value of f(x) when <img alt="##A112800eqn01" height="10" id="##A112800eqn01" label="" longdesc="normal" src="##A112800eqn01" width="10"/></p>
Anyone have any ideas for how to proceed?
@Martin Honnen: Thank you for the response. Your code solves the error.
However, I have an additional issue. When there is more than one MathType callout in a stem, it causes an error. I am sure that the cause is my regex not capturing everything properly, but I have hammered on this for a while to no avail. Below I will illustrate the issue I'm having.
Sample Code:
<stem type="text">What is the value of <I>f</I>(<I>x</I>) when ${##A112800eqn01:3}, and ${##A112800eqn02:3} is 3.</stem>
Desired Output:
<p>What is the value of <I>f</I>(<I>x</I>) when <img alt="##A112800eqn01" height="10" id="##A112800eqn01" label="" longdesc="normal" src="##A112800eqn01" width="10"/>, and <img alt="##A112800eqn02" height="10" id="##A112800eqn02" label="" longdesc="normal" src="##A112800eqn02" width="10"/> is 3.</p>
What I'm getting:
<p>What is the value of <I>f</I>(<I>x</I>) when <img alt="##A112800eqn01:3}, and ${##A112800eqn02" height="10" id="##A112800eqn01:3}, and ${##A112800eqn02" label="" longdesc="normal" src="##A112800eqn01:3}, and ${##A112800eqn02" width="10"/> is 3.</p>
Don't match on an element and then put xsl:choose
inside of the template to distinguish further, instead simply write templates for the different elements or elements with certain attribute values.
And if you want to use analyze-string
then do that in a template of a text
node, not in the template of an element containing mixed content:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="assessmentItem">
<!--SNIP-->
<xsl:apply-templates select="stemArea/stem"/>
<!--SNIP-->
</xsl:template>
<xsl:template match="stem[. = '' and @type!='art'] | stem[. = ' ' and @type != 'art']"/>
<xsl:template match="stem[@style='box' or @style='boxL']">
<p><span label="Tag_7"><xsl:apply-templates/></span></p>
</xsl:template>
<xsl:template match="stem[.//text()[matches(., '\$\{.+\}')]]">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>
<xsl:template match="stem//text()[matches(., '\$\{.+\}')]">
<xsl:analyze-string regex="(\$\{{)(##.+[eqn|art]\d+)([^a-zA-Z0-9]?.*\}})" select=".">
<xsl:matching-substring>
<img alt="{regex-group(2)}" height="10" id="{regex-group(2)}" label="" longdesc="normal" src="{regex-group(2)}" width="10"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
With that stylesheet, when applied to the input
<stem>What is the value of <I>f</I>(<I>x</I>) when ${##A112800eqn01:3}</stem>
I get the result
<p>What is the value of <I>f</I>(<I>x</I>) when <img alt="##A112800eqn01" height="10" id="##A112800eqn01" label="" longdesc="normal" src="##A112800eqn01" width="10"/></p>
The above is meant as a suggestion on how to approach your stylesheet design, it is likely not a complete solution as I don't have much input samples to test and don't know the input XML and text format you are trying to process.
I would probably implement
<xsl:template match="stem[. = '' and @type!='art'] | stem[. = ' ' and @type != 'art']"/>
as
<xsl:template match="stem[not(normalize-space()) and @type!='art']"/>
instead but I have mainly tried to show how to structure the stylesheet with templates and how to match on a descendant text node of stem
to ensure the analyze-string
does not swallow elements nodes inside stem
.
As for your edited input requirement, I have changed the regular expression to use non-greedy matching (.*?
), so with the code below you should be able to match on several patterns in a stem
to create several img
elements:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="assessmentItem">
<!--SNIP-->
<xsl:apply-templates select="stemArea/stem"/>
<!--SNIP-->
</xsl:template>
<xsl:template match="stem[. = '' and @type!='art'] | stem[. = ' ' and @type != 'art']"/>
<xsl:template match="stem[@style='box' or @style='boxL']">
<p><span label="Tag_7"><xsl:apply-templates/></span></p>
</xsl:template>
<xsl:template match="stem[.//text()[matches(., '\$\{.+?\}')]]">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>
<xsl:template match="stem//text()[matches(., '\$\{.+?\}')]">
<xsl:analyze-string regex="(\$\{{)(##.+?[eqn|art]\d+)([^a-zA-Z0-9]?.*?\}})" select=".">
<xsl:matching-substring>
<img alt="{regex-group(2)}" height="10" id="{regex-group(2)}" label="" longdesc="normal" src="{regex-group(2)}" width="10"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>