Search code examples
xmlxsdxsd-validationxml-validation

How to define an XML Schema for recursive and unknown-named elements?


Is it possible to define an XML Schema for recursive, unknown-named elements? For example:

<Bob age="72">
    <FavoriteColor>"Blue"</FavoriteColor>
    <Children>
        <Sally age="36">
            <FavoriteColor>"Green"</FavoriteColor>
            <Children />
        </Sally>
        <Joe age="34">
            <FavoriteColor>"Red"</FavoriteColor>
            <Children>
                <Tina age="5">
                    <FavoriteColor>"Blue"</FavoriteColor>
                    <Children />
                </Tina>
                <Frank age="6">
                    <FavoriteColor>"Yellow"</FavoriteColor>
                    <Children />
                </Frank>
            </Children>
        </Joe>
    </Children>
</Bob>

I'm fairly new to XSD, but I think this requires some combination of recursion and <xs:any>.

See: Recursion in an XML schema? and how can I define an xsd file that allows unknown (wildcard) elements?

However I can't find a solution that doesn't involve rewriting my implied Person elements with the stricter form:

<Person name="Bob" age="72">
    <FavoriteColor />
    <Children />

Is the original XML Schema-able?


Solution

  • You've correctly identified xsd:any as the solution to allowing any element under Children, and you've correctly identified the recursive type pattern for representing family-tree-like structures.

    However, xsd:any is not up to the task of partial constraint of the elements it governs. Once you use xsd:any, you lose control over markup beneath that point. Its only options pertain to how validity is affected by the existence of definitions for encountered elements; see processContents strict vs lax vs skip for xsd:any.

    Trying to make the first example work will be an uphill battle. (You could use assertions in XSD 1.1, but you'll be unnaturally forcing assertions to do what is intended to be done via simple content models.) Go with the second form; it's better XML design.