I am trying to set up a schematron test for validating special characters in XML...
More specifically, I would like to throw a warning where there is an occurrence of the copyright symbol (Unicode U+00A9).
It seems that schematron xml files cannot be parsed when using any of the following notation for the rules...
<iso:rule context="myelement>
<iso:report test="matches(., '\u00A9')">{ES1037} Copyright Symbol Detected</iso:report>
</iso:rule>
<iso:rule context="myelement>
<iso:report test="matches(., '\u{00A9}')">{ES1037} Copyright Symbol Detected</iso:report>
</iso:rule>
<iso:rule context="myelement>
<iso:report test="matches(., '\u{A9}')">{ES1037} Copyright Symbol Detected</iso:report>
</iso:rule>
<iso:rule context="myelement>
<iso:report test="matches(., '\x{00A9}')">{ES1037} Copyright Symbol Detected</iso:report>
</iso:rule>
Any schematron experts out there that know how to accomplish embedding a unicode character into a regex?
Thanks in advance...
You need to write the code as character entity like it is used for the XML Schema standard:
<?xml version="1.0" encoding="UTF-8"?>
<iso:schema xmlns:iso="http://purl.oclc.org/dsdl/schematron">
<iso:pattern id="unicode in regex">
<iso:rule context="a">
<iso:report test="matches(., '©')">
Copyright found
</iso:report>
</iso:rule>
</iso:pattern>
</iso:schema>