Search code examples
xmlxsltxhtml

How to make a conformant XML parser fill-in a missing attribute and value, using an XML Schema or another XML language


I am looking for a simple, reusable, way that I can get an XML parser to add an attribute and its associated value to any element of a specific name, that is missing the attribute and/or has a different value, or throw a parsing error. Ideally, the solution is supported by the newest versions of the major browsers, but browser support for XML is often not that great, and it varies quite a lot between browser, so it's okay if it doesn't have browser support.

In truth, I would accept any standardized XML extension/namespace/etc to make this work.

For a better example of what I'm trying to do, let's take the XHTML script tag as an example. I have an element whose tag name is *script*, it comes from the XHTML namespace *xhtml:*, I am looking for any instance of the element *xhtml:script* that is missing any of the attributes *async*, *defer*, or *type*, or has values other than what I want them to have, or has any textual content in between the tags (it should be a self-closing tag).

I am trying to automatically make any script tag that looks like the following:

<script src="main.js" />

to be converted to:

<script async="async" defer="defer" type="module" src="main.js" />

Or, to at least cause the XML parser to err and halt parsing at that point.

Ideally, the author of the XML file isn't even allowed to use the attributes, as they should always be automatically filled in with the correct values.

I had thought that XSL(T) could work for this, but that would require generating a whole new output file. If I misunderstood XSL, please, feel free to correct me. Although, I do want it to be as simple as it is to use XSL; to use XSL, all that has to be done to the source document is an xml-stylesheet has to be added. Also, I'm not sure how well supported XSL 2.0 is by browsers, if it is a

Because of this, I had taken to looking at other related XML technologies, encountering XML-Schemas (XSD).

After searching the web for any information on them, I found that w3schools says the following on the XML Schema attribute element's fixed attribute:

A fixed value is also automatically assigned to the attribute when no other value is specified. But unlike default values; if you specify another value than the fixed, the document is considered invalid.

So, it seemed that this is exactly what I was looking for, after looking around a bit more, I made to following:

<?xml version="1.0" encoding="UTF-8?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
    <xs:element name="script">
        <xs:simpleType>
            <xs:attribute name="type" type="xs:string" fixed="module" />
            <xs:attribute name="async" type="xs:string" fixed="async" />
            <xs:attribute name="defer" type="xs:string" fixed="defer" />
            <xs:attribute name="src" type="xs:string" use="required" />
        </xs:simpleType>
    </xs:element>
</xs:schema>

But this doesn't seem to have any effect on the parsing of the document.


Solution

  • Saxon's schema processor has an extension saxon:preprocess

    https://saxonica.com/documentation/index.html#!schema-processing/extensions11/preprocess

    and it occurs to me that this could be used to implement your requirement. If you (a) define a default value for the attribute, and (b) define a saxon:preprocess facet, then in the preprocess expression you can accept any input value and turn it into a valid value.

    To actually output a document containing the resulting values, you will need to supply the validated/preprocessed XML as input to a schema-aware XSLT stylesheet, which copies the typed value of the input to become the string value of the output: <xsl:attribute name="{name()}" select="data()}"/>.

    This isn't going to work in the browser (Saxon-JS doesn't support schema processing).