Search code examples
xsdxsd-validationvalidationrules

Can you have more than one indicator in an XSD complex element?


I'm trying to build a new schema to validate XML against for my job. But I'm having a hard time answering the question: can I and how do I create a complex element that has some elements that need to be in a set sequence and other subelements that do not? Ultimately I think I should be able to have opening and closing "sequence" tags and opening and closing "all" tags around two sets of elements, but xsd doesn't seem to like that. Here's what I have:

<xsd:complexType name="Original">
        <xsd:sequence>
            <xsd:element maxOccurs="1" minOccurs="1" name="AssetIdentifier" type="xsd:string">
                <xsd:annotation>
                    <xsd:documentation>Definition: The Asset Identifier element is intended to
                        reflect the root of all following digital filenames.</xsd:documentation>
                </xsd:annotation>
            </xsd:element>
            <xsd:element maxOccurs="1" minOccurs="0" name="ArchiveID" type="xsd:string">
                <xsd:annotation>
                    <xsd:documentation>Definition: The Filename element in this section is
                        intended to reflect the root of all the following derivative digital
                        filenames.</xsd:documentation>
                </xsd:annotation>
            </xsd:element>
            <xsd:element maxOccurs="1" minOccurs="1" name="Title" type="xsd:string">
                <xsd:annotation>
                    <xsd:documentation>Definition: The known title of the asset. If no title is
                        known, one can be assigned; a number or letter sequence, whichever is
                        the most logical. Using the value "unknown" is also
                        acceptable.</xsd:documentation>
                </xsd:annotation>
            </xsd:element>
            <xsd:element maxOccurs="1" minOccurs="1" name="RecordDate" type="xsd:date">
                <xsd:annotation>
                    <xsd:documentation>Definition: The actual recording date of the asset.
                        Estimates, partial dates, and date ranges (i.e. 19XX, Feb. 19-24,
                        1934-1935, etc.) are allowable, as is 'unknown'. Best practice, when
                        applicable, is to use the YYYY-MM-DD format in accordance with ISO 8601.
                        Even partial dates, i.e. 1990-05 should adhere to this
                        format.</xsd:documentation>
                </xsd:annotation>
            </xsd:element>
            <xsd:element maxOccurs="1" minOccurs="1" name="FormatType" type="xsd:string">
                <xsd:annotation>
                    <xsd:documentation>Definition: The format of the analog asset, i.e. Open
                        Reel, Grooved Disc, DAT, Cassette, VHS, 16mm film, EIAJ,
                        etc.</xsd:documentation>
                    <xsd:documentation>Best Practice: The MediaPreserve maintains a list of
                        controlled vocabularies organized by media type at: www.dontknowyet.com.
                        However, MP opted to meake this an unrestricted element in the event
                        that other ogranizations have their own controlled vocabularies in
                        place.</xsd:documentation>
                </xsd:annotation>
            </xsd:element>
         </xsd:sequence>
        <xsd:all>
            <xsd:element maxOccurs="1" minOccurs="0" name="StockBrand" type="xsd:string">
                <xsd:annotation>
                    <xsd:documentation>If known definitively</xsd:documentation>
                </xsd:annotation>
            </xsd:element>
            <xsd:element maxOccurs="1" minOccurs="0" name="TapeModel" type="xsd:string">
                <xsd:annotation>
                    <xsd:documentation>If applicable. Usually applies to DAT tapes, open reels,
                        and wire recordings.</xsd:documentation>
                </xsd:annotation>
            </xsd:element>
            <xsd:element maxOccurs="1" minOccurs="0" name="TapeWidth" type="xsd:string">
                <xsd:annotation>
                    <xsd:documentation>Typically only applicable for open reel
                        audio</xsd:documentation>
                </xsd:annotation>
            </xsd:element>
        </xsd:all>


Solution

  • XSDs unfortunately do not allow what you're trying to do (combine <sequence/> and <all /> inside a single complex type or element). You might be able to achieve something similar with a nested content model, but note you can't nest <all> except under another <all />, otherwise you must define it in another element. You can however, nest either <sequence> or <choice> under each other.

    From my understanding of XSDs, you have 3 viable options.

    The first is to nest all the elements you want under <all /> to be contained within their own sub-element:

    <xs:complexType name="Original">
      <xs:sequence>
        <!-- AssetIdentifier to FormatType left out for brevity -->
        <xs:element name="Misc">
          <xs:complexType>
            <xs:all>
              <xs:element maxOccurs="1" minOccurs="0" name="StockBrand" type="xs:string" />
              <xs:element maxOccurs="1" minOccurs="0" name="TapeModel" type="xs:string" />
              <xs:element maxOccurs="1" minOccurs="0" name="TapeWidth" type="xs:string" />
            </xs:all>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
    <!-- For the above, valid XML would be: -->
    <Original>
      <AssetIdentifier>AssetIdentifier0</AssetIdentifier>
      <Title>Title0</Title>
      <RecordDate>2006-05-04</RecordDate>
      <FormatType>FormatType0</FormatType>
      <Misc>
        <!-- Optional & order doesn't matter -->
        <StockBrand>what</StockBrand>
        <TapeWidth>1290</TapeWidth>
        <TapeModel>Hey</TapeModel>
      </Misc>
    </Original>
    
    

    Second is to nest those elements under another <sequence />, which allows you to forgo specifying another sub-element, but now requires the elements appear in order as specified in the schema. Note that the nested sequence itself can be optional.

    <xs:complexType name="Original">
      <xs:sequence>
        <!-- AssetIdentifier to FormatType left out for brevity -->
        <xs:sequence minOccurs="0">
          <xs:element maxOccurs="1" minOccurs="0" name="StockBrand" type="xs:string" />
          <xs:element maxOccurs="1" minOccurs="0" name="TapeModel" type="xs:string" />
          <xs:element maxOccurs="1" minOccurs="0" name="TapeWidth" type="xs:string" />
        </xs:sequence>
      </xs:sequence>
    </xs:complexType>
    
    <!-- For the above, valid XML would be: -->
    <Original>
      <AssetIdentifier>AssetIdentifier0</AssetIdentifier>
      <Title>Title0</Title>
      <RecordDate>2006-05-04</RecordDate>
      <FormatType>FormatType0</FormatType>
      <!-- Optional below, but must be ordered -->
      <StockBrand>what</StockBrand>
      <TapeModel>Hey</TapeModel>
      <TapeWidth>1290</TapeWidth>
    </Original>
    

    There's a third option that is a bit of a 'hack', but still allows specifying elements go unordered, still remain optional, yet still appear adjacent to the other mandatory, in-order elements. This nests a choice (with maxOccurs="3") under sequence, inside the parent sequence (sequence > sequence > choice):

    <xs:complexType name="Original">
      <xs:sequence>
        <!-- AssetIdentifier to FormatType left out for brevity -->
        <xs:sequence>
          <xs:choice maxOccurs="3" minOccurs="0">
            <xs:element name="StockBrand" type="xs:string"/>
            <xs:element name="TapeModel" type="xs:string"/>
            <xs:element name="TapeWidth" type="xs:string"/>
          </xs:choice>
        </xs:sequence>
      </xs:sequence>
    </xs:complexType>
    <!-- For the above, valid XML would be: -->
    <Original>
      <AssetIdentifier>AssetIdentifier0</AssetIdentifier>
      <Title>Title0</Title>
      <RecordDate>2006-05-04</RecordDate>
      <FormatType>FormatType0</FormatType>
      <!-- Optional, unordered, but there's a catch: -->
      <TapeWidth>1290</TapeWidth>
      <StockBrand>what</StockBrand>
      <TapeModel>Hey</TapeModel>
    </Original>
    

    There's a catch with this 3rd option however, the maxOccurs="3" on the <choice /> element renders the minOccurs and maxOccurs on the child elements (StockBrand, TapeModel and TapeWidth) meaningless; which means those elements, while still remaining optional, can now appear more than once, so long as the cumulative total of elements is still 3 or less:

    This becomes valid (2 of the same element + 1 more):

      <TapeWidth>1290</TapeWidth>
      <TapeWidth>1291</TapeWidth>
      <TapeModel>Hey</TapeModel>
    

    And this is still valid (3 of the same):

      <TapeWidth>1290</TapeWidth>
      <TapeWidth>1291</TapeWidth>
      <TapeWidth>1292</TapeWidth>
    

    And also this (just 1 occurence of 1 element):

      <StockBrand>1290</StockBrand>
    

    You could probably try to find another option by fiddling with the combination of sequence and choice nesting, but it's best practice to keep your schemas simple. Personally I would recommend the first 2 options over the third purely to keep your schema simple.