I used to use xslt a few years ago and while I wans't an expert I could write basic transformations. I am having issues now which I don't understand.
Here I am trying to extract a Dublin Core record from a foxml record. The Dublin Core record in in xml, and foxml is basically an xml standard that groups lots of xml records.
Here is my xml:
<?xml version="1.0" encoding="UTF-8"?>
<foxml:digitalObject VERSION="1.1" PID="vital:26113"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="info:fedora/fedora-system:def/foxml# http://www.fedora.info/definitions/1/0/foxml1-1.xsd">
<foxml:objectProperties>
<foxml:property NAME="info:fedora/fedora-system:def/model#state" VALUE="Active"/>
<foxml:property NAME="info:fedora/fedora-system:def/model#label" VALUE="DCity/DCCPC_DC.xml"/>
<foxml:property NAME="info:fedora/fedora-system:def/model#ownerId" VALUE=""/>
<foxml:property NAME="info:fedora/fedora-system:def/model#createdDate"
VALUE="2016-09-06T19:49:51.257Z"/>
<foxml:property NAME="info:fedora/fedora-system:def/view#lastModifiedDate"
VALUE="2016-09-27T13:23:10.950Z"/>
<foxml:extproperty NAME="http://www.w3.org/1999/02/22-rdf-syntax-ns#type" VALUE="FedoraObject"/>
<foxml:extproperty NAME="info:fedora/fedora-system:def/model#contentModel" VALUE=""/>
</foxml:objectProperties>
<foxml:datastream ID="DC" STATE="A" CONTROL_GROUP="X" VERSIONABLE="true">
<foxml:datastreamVersion ID="DC.0" LABEL="Dublin Core for this Record"
CREATED="2016-09-06T19:49:51.290Z" MIMETYPE="text/xml"
FORMAT_URI="http://www.openarchives.org/OAI/2.0/oai_dc/" SIZE="653">
<foxml:xmlContent>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>St. Patricks</dc:title>
<dc:creator>Mary Mooney</dc:creator>
<dc:publisher>Publisher</dc:publisher>
<dc:format>Photograph</dc:format>
<dc:identifier>123456</dc:identifier>
<dc:identifier>100.jpg</dc:identifier>
<dc:coverage>1984</dc:coverage>
<dc:rights>Publisher</dc:rights>
</oai_dc:dc>
</foxml:xmlContent>
</foxml:datastreamVersion>
<foxml:datastreamVersion ID="DC.1" LABEL="Dublin Core for this Record"
CREATED="2016-09-27T13:23:10.894Z" MIMETYPE="text/xml"
FORMAT_URI="http://www.openarchives.org/OAI/2.0/oai_dc/" SIZE="653">
<foxml:xmlContent>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>St Audoen's</dc:title>
<dc:creator>William Mooney</dc:creator>
<dc:publisher>Publisher</dc:publisher>
<dc:format>Photograph</dc:format>
<dc:identifier>10987654</dc:identifier>
<dc:identifier>200.jpg</dc:identifier>
<dc:coverage>1984</dc:coverage>
<dc:rights>Publisher</dc:rights>
</oai_dc:dc>
</foxml:xmlContent>
</foxml:datastreamVersion>
</foxml:datastream>
</foxml:digitalObject>
and here is my xslt
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:audit="info:fedora/fedora-system:def/audit#" xmlns:premis="http://www.loc.gov/standards/premis/v1"
exclude-result-prefixes="xs"
version="2"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="info:fedora/fedora-system:def/foxml# http://www.fedora.info/definitions/1/0/foxml1-1.xsd">
<xsl:output method="xml" indent="yes" name="xml"/>
<xsl:template match="/foxml:digitalObject/foxml:datastreamVersion[@ID eq DC.1]/foxml:xmlContent">
<metadata>
<xsl:value-of select="oai_dc:dc"/>
<xsl:copy-of select="."/>
</metadata>
</xsl:template>
</xsl:stylesheet>
I would expect the DC section of the foxml:datastreamVersion with the ID=DC.1 to be returned. Instead I get the following:
<?xml version="1.0" encoding="UTF-8"?>
St. Patricks
Mary Mooney
Publisher
Photograph
123456
100.jpg
1984
Publisher
St Audoen's
William Mooney
Publisher
Photograph
10987654
200.jpg
1984
Publisher
So I have two obvious problems.
why is it selecting material from the node that doesn't match the attribute I have selected?
why is it only returning the text, rather than the accompanying element tags etc?
I am using oXygen 19.1 with the Saxon-EE.9.7.0.19 transformer.
Firstly, you have a problem with your match expression, it should be this...
/foxml:digitalObject/foxml:datastream/foxml:datastreamVersion[@ID eq 'DC.1']/foxml:xmlContent
You had missed out foxml:datastream
in the path. Also "DC.1" needed to be put in apostrophes to make it a string, as opposed to an element name.
However, in answer to your question "why is it selecting material from the node that doesn't match the attribute", the answer is "Because of XSLT's Built-In Templates"
When XSLT starts its processing it will look for a template matching the document node /
. You have no such template in your XSLT, and so the default template kicks in. Effectively, it is equivalent of having these two templates in your XSLT
<xsl:template match="*|/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="text()|@*">
<xsl:value-of select="."/>
</xsl:template>
These skip over elements, but output text where it finds it, which leads to all other text being output. To stop this happening, add this template to your XSLT
<xsl:template match="node()">
<xsl:apply-templates />
</xsl:template>
(In XSLT 3.0, do <xsl:mode on-no-match="shallow-skip" />
instead)
As for the second question, when the template matches, you do <xsl:value-of select="oai_dc:dc"/>
and that outputs all descendant text nodes. You should use xsl:copy-of
instead.
Try this XSLT
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:audit="info:fedora/fedora-system:def/audit#" xmlns:premis="http://www.loc.gov/standards/premis/v1"
exclude-result-prefixes="xs dc oai_dc audit premis foxml xsi"
version="2"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="info:fedora/fedora-system:def/foxml# http://www.fedora.info/definitions/1/0/foxml1-1.xsd">
<xsl:output method="xml" indent="yes" name="xml"/>
<xsl:template match="node()">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="/foxml:digitalObject/foxml:datastream/foxml:datastreamVersion[@ID eq 'DC.1']/foxml:xmlContent">
<metadata>
<xsl:copy-of select="oai_dc:dc"/>
</metadata>
</xsl:template>
</xsl:stylesheet>
Alternatively, simply match the document node, and target the node you wish to copy with a select.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:audit="info:fedora/fedora-system:def/audit#" xmlns:premis="http://www.loc.gov/standards/premis/v1"
exclude-result-prefixes="xs dc oai_dc audit premis foxml xsi"
version="2"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="info:fedora/fedora-system:def/foxml# http://www.fedora.info/definitions/1/0/foxml1-1.xsd">
<xsl:output method="xml" indent="yes" name="xml"/>
<xsl:template match="/">
<xsl:apply-templates select="foxml:digitalObject/foxml:datastream/foxml:datastreamVersion[@ID eq 'DC.1']/foxml:xmlContent" />
</xsl:template>
<xsl:template match="foxml:xmlContent">
<metadata>
<xsl:copy-of select="oai_dc:dc"/>
</metadata>
</xsl:template>
</xsl:stylesheet>