Search code examples
javaxmlxsdstaxwoodstox

Validation fails when writing XML with Woodstox / Stax2


I'm having an issue with XML validation using an XSD schema with Woodstox and Stax2. Validation fails even though the XML data conforms to the schema.

Surprisingly, the validation issue only occurs when writing XML (using XMLStreamWriter2), not when reading XML (using XMLStreamReader2).

I've built a small example to reproduce and isolate the error. Basically, it just reads XML from a file into an XMLStreamReader2 (validating with an XSD schema), then copy it to an XMLStreamWriter2 (also validating with the same XSD).

Now, this fails with a validation error from the writer. If I deactivate validation on the writer, everything goes smoothly and the writer delivers perfectly conform XML.

Here is the code:

import com.ctc.wstx.stax.WstxInputFactory;
import com.ctc.wstx.stax.WstxOutputFactory;
import org.codehaus.stax2.XMLStreamReader2;
import org.codehaus.stax2.XMLStreamWriter2;
import org.codehaus.stax2.validation.XMLValidationSchema;
import org.codehaus.stax2.validation.XMLValidationSchemaFactory;

import javax.xml.stream.XMLStreamException;
import java.io.InputStream;
import java.io.StringWriter;

public class Converter {

    public static void main(String... args) throws XMLStreamException {

        InputStream reader = Converter.class.getClassLoader().getResourceAsStream("test.xml");
        StringWriter writer = new StringWriter();

        XMLValidationSchema schema = XMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA)
                .createSchema(Converter.class.getClassLoader().getResourceAsStream("schema.xsd"));


        XMLStreamReader2 xmlReader = (XMLStreamReader2) new WstxInputFactory().createXMLStreamReader(reader);
        xmlReader.validateAgainst(schema);

        XMLStreamWriter2 xmlWriter = (XMLStreamWriter2) new WstxOutputFactory().createXMLStreamWriter(writer);
        xmlWriter.validateAgainst(schema);

        xmlWriter.copyEventFromReader(xmlReader, false);

        while (xmlReader.hasNext()) {
            xmlReader.next();

            xmlWriter.copyEventFromReader(xmlReader, false);
        }

        System.out.println(writer.toString());
    }
}

Here is the XML:

<?xml version="1.0" encoding="UTF-8"?>
<JobStatus xsdVersion="NA">
    <Document>
        <DocumentId>1234567890</DocumentId>
    </Document>
    <Document>
        <DocumentId>1234567891</DocumentId>
    </Document>
</JobStatus>

And here is the schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema elementFormDefault="qualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="JobStatus">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="Document" maxOccurs="unbounded">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element name="DocumentId" type="xs:string"/>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
            <xs:attribute name="xsdVersion" type="xs:string" use="required"/>
        </xs:complexType>
    </xs:element>
</xs:schema>

And this all result in (with validation on the writer enabled):

Exception in thread "main" com.ctc.wstx.exc.WstxValidationException: element "JobStatus" is missing "xsdVersion" attribute
        at [row,col {unknown-source}]: [1,66]
        at com.ctc.wstx.exc.WstxValidationException.create(WstxValidationException.java:50)
        at com.ctc.wstx.sw.BaseStreamWriter.reportProblem(BaseStreamWriter.java:1223)
        at com.ctc.wstx.msv.GenericMsvValidator.reportError(GenericMsvValidator.java:549)
        at com.ctc.wstx.msv.GenericMsvValidator.reportError(GenericMsvValidator.java:541)
        at com.ctc.wstx.msv.GenericMsvValidator.reportError(GenericMsvValidator.java:535)
        at com.ctc.wstx.msv.GenericMsvValidator.validateElementAndAttributes(GenericMsvValidator.java:343)
        at com.ctc.wstx.sw.BaseNsStreamWriter.closeStartElement(BaseNsStreamWriter.java:420)
        at com.ctc.wstx.sw.BaseStreamWriter.copyEventFromReader(BaseStreamWriter.java:807)
        at Converter.main(Converter.java:34)

Without validation on the writer, the program runs fine and returns the same XML provided as input (modulo some indentation and line break differences)

So my question is: am I doing something wrong with Woodstox here? Why does validation fail only on the writer ?

I can reproduce this issue with other pairs of XSD and XML, in which case you can get different kind of errors, but always on writer side. Validation on the reader side always work (as long as the XML conforms to the XSD obviously).

Any insights would be greatly appreciated !

PS: for reference, here are the dependencies and version the example uses

  • org.codehaus.woodstox stax2-api 4.0.0
  • com.fasterxml.woodstox woodstox-core 5.0.2
  • net.java.dev.msv msv-core 2013.6.1
  • net.java.dev.msv xsdlib 2013.6.1<

Solution

  • It appears this was a bug in Woodstox when validating on write : https://github.com/FasterXML/woodstox/issues/16

    The bug is now fixed as of release 5.0.3 of Woodstox, however there are still some issues with validation on write (see https://github.com/FasterXML/woodstox/issues/23).