Search code examples
javaxmlxsd

XML to be validated against multiple xsd schemas


I'm writing the xsd and the code to validate, so I have great control here.

I would like to have an upload facility that adds stuff to my application based on an xml file. One part of the xml file should be validated against different schemas based on one of the values in the other part of it. Here's an example to illustrate:

<foo>
  <name>Harold</name>
  <bar>Alpha</bar>
  <baz>Mercury</baz>
  <!-- ... more general info that applies to all foos ... -->

  <bar-config>
    <!-- the content here is specific to the bar named "Alpha" -->
  </bar-config>
  <baz-config>
    <!-- the content here is specific to the baz named "Mercury" -->
  </baz>
</foo>

In this case, there is some controlled vocabulary for the content of <bar>, and I can handle that part just fine. Then, based on the bar value, the appropriate xml schema should be used to validate the content of bar-config. Similarly for baz and baz-config.

The code doing the parsing/validation is written in Java. Not sure how language-dependent the solution will be.

Ideally, the solution would permit the xml author to declare the appropriate schema locations and what-not so that s/he could get the xml validated on the fly in a sufficiently smart editor.

Also, the possible values for <bar> and <baz> are orthogonal, so I don't want to do this by extension for every possible bar/baz combo. What I mean is, if there are 24 possible bar values/schemas and 8 possible baz values/schemas, I want to be able to write 1 + 24 + 8 = 33 total schemas, instead of 1 * 24 * 8 = 192 total schemas.

Also, I'd prefer to NOT break out the bar-config and baz-config into separate xml files if possible. I realize that might make all the problems much easier, as each xml file would have a single schema, but I'm trying to see if there is a good single-xml-file solution.


Solution

  • I finally figured this out.

    First of all, in the foo schema, the bar-config and baz-config elements have a type which includes an any element, like this:

    <sequence>
        <any minOccurs="0" maxOccurs="1"
            processContents="lax" namespace="##any" />
    </sequence>
    

    In the xml, then, you must specify the proper namespace using the xmlns attribute on the child element of bar-config or baz-config, like this:

    <bar-config>
        <config xmlns="http://www.example.org/bar/Alpha">
            ... config xml here ...
        </config>
    </bar-config>
    

    Then, your XML schema file for bar Alpha will have a target namespace of http://www.example.org/bar/Alpha and will define the root element config.

    If your XML file has namespace declarations and schema locations for both of the schema files, this is sufficient for the editor to do all of the validating (at least good enough for Eclipse).

    So far, we have satisfied the requirement that the xml author may write the xml in such a way that it is validated in the editor.

    Now, we need the consumer to be able to validate. In my case, I'm using Java.

    If by some chance, you know the schema files that you will need to use to validate ahead of time, then you simply create a single Schema object and validate as usual, like this:

    Schema schema = factory().newSchema(new Source[] {
        new StreamSource(stream("foo.xsd")),
        new StreamSource(stream("Alpha.xsd")),
        new StreamSource(stream("Mercury.xsd")),
    });
    

    In this case, however, we don't know which xsd files to use until we have parsed the main document. So, the general procedure is to:

    1. Validate the xml using only the main (foo) schema
    2. Determine the schema to use to validate the portion of the document
    3. Find the node that is the root of the portion to validate using a separate schema
    4. Import that node into a brand new document
    5. Validate the brand new document using the other schema file

    Caveat: it appears that the document must be built namespace-aware in order for this to work.

    Here's some code (this was ripped from various places of my code, so there might be some errors introduced by the copy-and-paste):

    // Contains the filename of the xml file
    String filename;
    
    // Load the xml data using a namespace-aware builder (the method 
    // 'stream' simply opens an input stream on a file)
    Document document;
    DocumentBuilderFactory docBuilderFactory =
        DocumentBuilderFactory.newInstance();
    docBuilderFactory.setNamespaceAware(true);
    document = docBuilderFactory.newDocumentBuilder().parse(stream(filename));
    
    // Create the schema factory
    SchemaFactory sFactory = SchemaFactory.newInstance(
        XMLConstants.W3C_XML_SCHEMA_NS_URI);
    
    // Load the main schema
    Schema schema = sFactory.newSchema(
        new StreamSource(stream("foo.xsd")));
    
    // Validate using main schema
    schema.newValidator().validate(new DOMSource(document));
    
    // Get the node that is the root for the portion you want to validate
    // using another schema
    Node node= getSpecialNode(document);
    
    // Build a Document from that node
    Document subDocument = docBuilderFactory.newDocumentBuilder().newDocument();
    subDocument.appendChild(subDocument.importNode(node, true));
    
    // Determine the schema to use using your own logic
    Schema subSchema = parseAndDetermineSchema(document);
    
    // Validate using other schema
    subSchema.newValidator().validate(new DOMSource(subDocument));