I am writing a schematron to validate the following xml file:
<root version="1.0">
<zone map="fields.map" display_name="Fields">
<zone.rectangles>
<rectangle h="2" w="2" x="0" y="0" />
</zone.rectangles>
</zone>
</root>
I want to make sure that if an attribute of any element is declared, then the element cannot contain a child with the same name as the attribute.
For instance, if <zone>
has an attribute map
, <zone>
cannot contain an element <zone.map>
.
Therefore, the previous xml file is valid, but the following one is not:
Not valid:
<root version="1.0">
<zone map="fields.map" display_name="Fields">
<zone.map>fields.map</zone.map>
<zone.rectangles>
<rectangle h="2" w="2" x="0" y="0" />
</zone.rectangles>
</zone>
</root>
This one, on the other hand, is valid:
Valid:
<root version="1.0">
<zone display_name="Fields">
<zone.map>fields.map</zone.map>
<zone.rectangles>
<rectangle h="2" w="2" x="0" y="0" />
</zone.rectangles>
</zone>
</root>
I got it working with this schematron file:
<schema xmlns="http://purl.oclc.org/dsdl/schematron">
<pattern>
<title>Attribute usage</title>
<!-- Every element that has attributes -->
<rule context="*[@*]">
<!-- The name of its children should not be {element}.{attribute} -->
<assert test="name(*) != concat(name(), '.', name(@*))">
The attribute <name />.<value-of select="name(@*)" /> is defined twice.
</assert>
</rule>
</pattern>
</schema>
It took me about 4 hours to get this working properly after numerous unfortunate tries, so I was pretty happy with this schema, and started testing it a bit more.
I was really disappointed to see that it was working only for the first attribute of every element. For example with the zone
element, only the map
attribute is tested. So putting a <zone.display_name>
element inside <zone map="" display_name="">
won't make the schema fail, while inverting the attributes like <zone display_name="" map="">
will trigger a failure.
What seems to be the issue, if I understand well, is that the wildcard @*
is actually not used as a list in concat(name(), '.', name(@*))
because concat() actually expects a single string, and name() a single element, as stated in this answer.
So how can I actually check that for every attribute, there is no equivalent element in the children?
It's a nested loop that could be represented in pseudo code as:
for attribute in element.attributes:
for child in element.children:
if child.name == element.name + "." + attribute.name:
raise Error
Any idea? I feel like I'm so near!
I finally got it working by using a variable.
I used this schematron:
<schema xmlns="http://purl.oclc.org/dsdl/schematron">
<pattern>
<title>Attribute usage</title>
<!-- Elements that contains a dot in their name -->
<rule context="*[contains(name(), '.')]">
<!-- Take the part after the dot -->
<let name="attr_name" value="substring-after(name(), '.')" />
<!-- Check that there is no parent's attributes with the same name -->
<assert test="count(../@*[name() = $attr_name]) = 0">
The attribute <name /> is defined twice.
</assert>
</rule>
</pattern>
</schema>
Schematron is really powerful but you gotta get the hang of it...
If you want to loop over a wildcard *
or @*
, then count()
is your friend, because it actually takes lists of elements into account.
If you find yourself stuck, try turning the problem upside-down. I was looping over attributes, then over children, while now I'm looping over every element, then checking their parent's attributes.
If you want to use information that is in the parent's context, but find yourself stuck inside a []
close, use a variable to get the value out.
For instance, if you try ../@*[name() = name(..)]
, it won't do what you want, because name(..)
inside []
refers to the attribute's parent's name, not the current context element's name.
If you extract the value as <let name="element_name" value="name()" />
, then you're good to go : ../@*[name() = $element_name]
.
When you open square brackets, you don't have access to elements outside those brackets anymore, so use variables to get them in.
You can use the current()
function to get the context element from within brackets, without having to use a variable. My final schema is:
<schema xmlns="http://purl.oclc.org/dsdl/schematron">
<pattern>
<title>Attribute usage</title>
<!-- Elements that contains a dot in their name -->
<rule context="*[contains(name(), '.')]">
<!-- Check that there is no parent's attributes with the same name -->
<assert test="not(../@*[name() = substring-after(name(current()), '.')])">
The attribute <name /> is defined twice.
</assert>
</rule>
</pattern>
</schema>
Thanks to Eiríkr Útlendi for that!