I would like to validate a custom XML document against a schema. This document would include a structure with any number of elements, each having a specific attribute. Something like this:
<Root xmlns="http://tns">
<Records>
<FirstRecord attr='whatever'>content for first record</FirstRecord>
<SecondRecord attr='whatever'>content for first record</SecondRecord>
...
<LastRecord attr='whatever'>content for first record</LastRecord>
</Records>
</Root>
The author of the XML document can include any number of records, each with an arbitrary name of his or her choosing. How is this possible to validate this against an XML Schema ?
I have tried to specify the appropriate structure type in a schema, but I do not know how to reference it in the appropriate location:
<xs:schema xmlns="http://tns" targetNamespace="http://tns" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="RecordType"> <!-- This is my record type -->
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="attr" type="xs:string" use="required" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<xs:element name="Root">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="1" maxOccurs="1" name="Records">
<!-- This is where records should go -->
<xs:complexType />
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
What you describe is possible in XSD 1.1, and something very similar is possible in XSD 1.0, which is not to say it's an advisable design.
In XML vocabularies, the element type normally conveys relevant information about the type of information, and it is the name of the element type that is used to drive validation in most XML schema languages; the design you describe is (some would say) a little bit like asking if I can define an object class in Java, or a struct in C, which obeys the constraints that the members can have arbitrary names, as long as one of them is an integer with the value 42. That, or something like it, may well be possible, but most experienced designers will feel strongly that this is almost certainly not the right way to go about solving any normal problem.
On the other hand, doing unusual and awkward things with a system can sometimes help in learning how to use the system effectively. (You never know a system well, said a friend of mine once, until you have thoroughly abused it.) So my answer has two parts: how to come as close as possible to the design you specify in XSD, and alternatives you might consider instead.
The simplest way to specify the language you seem to want in XSD 1.1 is to define an assertion on the Records element which says (1) that every child of Records has an 'attr' attribute and (2) that no child of Records has any children. You'll have something like this:
...
<xs:element minOccurs="1" maxOccurs="1" name="Records">
<xs:complexType>
<xs:sequence>
<xs:any/>
</xs:sequence>
<xs:assert
test="every $child in * satisfies $child/@attr"/>
<xs:assert
test="not(*/*)"/>
</xs:complexType>
</xs:element>
...
As you can see, this is very similar to what InfantPro'Aravind' has described; it avoids the problems identified by InfantPro'Aravind' by using assertion, not type assignment, to impose the constraints you impose.
In XSD 1.0, assertion is not available, and the only way I can think of to come close to the design you describe is define an abstract element, which I'll call Record, as the child of Records, and to require that the elements which actually occur as children of Records be declared as being substitutable for this abstract type (which in turn requires that their types be derived from type RecordType). Your schema might say something like this:
<xs:element name="Root">
<xs:complexType>
<xs:sequence>
<xs:element name="Records">
<xs:complexType>
<xs:sequence>
<xs:element name="Record"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="RecordType">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="attr"
type="xs:string"
use="required" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<xs:element name="Record"
type="RecordType"
abstract="true"/>
Elsewhere in the schema (possibly in a separate schema document) you will need to declare FirstRecord, etc., and specify that they are substitutable for Record, thus:
<xs:element name="FirstRecord" substitutionGroup="Record"/>
<xs:element name="SecondRecord" substitutionGroup="Record"/>
<xs:element name="ThirdRecord" substitutionGroup="Record"/>
...
At some level, this matches your description, though I suspect you did not want to have to declare FirstRecord, SecondRecord, etc.
Having described ways in which XSD can do what you describe, I should also say that I wouldn't recommend either of these approaches. Instead, I'd design the XML vocabulary differently to work more naturally with XSD.
In the design as you specify it, every record appears to have the same type, but in addition to the content of the element they are allowed to convey a certain additional quantity of information by having a different name (FirstRecord, SecondRecord, etc.). This additional information could just as easily be conveyed in an attribute, which would allow you to specify Record as a concrete element, rather than an abstract element, giving it an extra "alternate-name" attribute. Your sample data would then take a form like this:
<Root xmlns="http://tns">
<Records>
<Record
alternate-name="FirstRecord"
attr='whatever'>content for first record</Record>
<Record
alternate-name="SecondRecord"
attr='whatever'>content for first record</Record>
...
<Record
alternate-name="LastRecord"
attr='whatever'>content for first record</Record>
</Records>
</Root>
This will be more or less acceptable depending on whether you or your data providers or tools in your tool chain attach some mystic or other significance to having the string "FirstRecord" be an element type name instead of an attribute value.
Alternatively, one could say that the point of the design is to allow Records to contain an arbitrary sequence of elements of arbitrary structure (on this account, the restriction to xs:string
is just an artifact of your example and is not really desired in reality) as long as we have, for each record, the information recorded in the 'attr' attribute. Easy enough to specify this: define 'Record' as a concrete element with an 'attr' attribute, accepting one child which can be any XML element:
<xs:element name="Record">
<xs:complexType>
<xs:sequence>
<xs:any processContents="lax"/>
</xs:sequence>
<xs:attribute name="attr"
use="required"
type="xs:string"/>
</xs:complexType>
</xs:element>
The value of the 'processContents' attribute can be changed to 'strict' or 'skip' or kept at 'lax', depending on whether you want FirstRecord, SecondRecord, etc. to be validated (and declared) or not.