Search code examples
includexsdreusability

partial schema included in multiple subschemas


My aim is to make a modular XML schema that has some shared types in one file available to all subschema files. What's the best way to go around this?

Example:

Say I want to build an XML schema which describes XML documents about cars and bikes. I then create a schema for the XML, which I divide up into 4 files: vehicles.xsd, cars.xsd, bikes.xsd and types.xsd. vehicles.xsd includes cars.xsd and bikes.xsd and they both in turn include types.xsd. I noticed when trying out this example with the command

xmllint --schema vehicles.xsd vehicles.xml

that it validates fine, even though I was expecting a conflict to arise because of the double inclusion of types.xsd (which leads to 2 definitions of the complexType vehicleType). Removing the <include> tag from either cars.xsd or bikes.xsd also validates just fine. Can someone explain to me what is going on here?

XML and XSDs:

vehicles.xml:

<vehicles xmlns="http://example.com/vehicles">
  <cars>
    <car make="Porsche" model="911" />
    <car make="Porsche" model="911" />
  </cars>
  <bikes>
    <bike make="Harley-Davidson" model="WL" />
    <bike make="Yamaha" model="XS650" />
  </bikes>
</vehicles>

vehicles.xsd:

<xs:schema
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:vh="http://example.com/vehicles"
  targetNamespace="http://example.com/vehicles"
  elementFormDefault="qualified">

  <xs:include schemaLocation="cars.xsd" />
  <xs:include schemaLocation="bikes.xsd" />

  <xs:element name="vehicles">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="vh:cars" />
        <xs:element ref="vh:bikes" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

cars.xsd:

<xs:schema
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:vh="http://example.com/vehicles"
  targetNamespace="http://example.com/vehicles"
  elementFormDefault="qualified">

  <xs:include schemaLocation="types.xsd" />

  <xs:element name="cars">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="car" type="vh:vehicleType"
          minOccurs="0" maxOccurs="unbounded" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

bikes.xsd:

<xs:schema
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:vh="http://example.com/vehicles"
  targetNamespace="http://example.com/vehicles"
  elementFormDefault="qualified">

  <xs:include schemaLocation="types.xsd" />

  <xs:element name="bikes">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="bike" type="vh:vehicleType"
          minOccurs="0" maxOccurs="unbounded" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

types.xsd

<xs:schema
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://example.com/vehicles">

  <xs:complexType name="vehicleType">
    <xs:attribute name="make" type="xs:string" />
    <xs:attribute name="model" type="xs:string" />
  </xs:complexType>
</xs:schema>

Solution

  • Most XSD processors notice, when asked to include a schema document like types.xsd, when they have already included it, and they avoid including it a second time; the XSD spec explicitly encourages this. That is why you are not getting error messages over the double inclusion, and why a single inclusion works fine for the merged case.

    In general, however, there is slightly better interoperability among XSD processors if you keep things simpler by doing all inclusions from a single top-level driver file. If you used that idiom, you'd drop the xs:include elements from all your schema documents, and make one or more new driver documents which contain nothing but inclusions (one if you only want one schema; multiple driver documents if you want different schemas with different sets of elements).

    Similar considerations apply to the use of the schemaLocation attribute on xs:import elements. The use of this idiom helps avoid situations (especially situations involving redefinition and reference cycles) which produce dramatically different results from different processors.