Search code examples
javaspring-bootjaxbxsd-validation

Validate XML against XSD which imports big enum XSD


I have an XSD which imports big XSD with more than 200k <enumeration> elements, I need to validate incoming XML against root xsd:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http www(dot)w3(dot)org/2001/XMLSchema" elementFormDefault="qualified"
    targetNamespace="http example(dot)com/test/test/types/2.0.0"
    xmlns:tns="http example(dot)com/test/test/types/2.0.0"
    xmlns:mun="http example(dot)com/test/test/enumXsd/types/2.0">
    <xs:import namespace="http example(dot)com/test/test/enumXsd/types/2.0"
        schemaLocation="common/enumXsd.xsd"/>
    <xs:element name="test" type="tns:test"/>
    <xs:complexType name="test">
         <xs:element name="abc" type="mun:EnumEType"/>
    </xs:complexType>
</xs:schema>

enumXsd.xsd:

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns:tns="http example(dot)com/test/test/enumXsd/types/2.0"
    elementFormDefault="qualified"
    targetNamespace="http example(dot)com/test/test/enumXsd/types/2.0"
    xmlns="http www w3 org/2001/XMLSchema">
    <simpleType name="EnumEType">
        <restriction base="string">
            <enumeration value="w1" />
            <enumeration value="2wr" />
            <enumeration value="3erwer" />
            <enumeration value="5rtete" />
             ...e.t.c...
        </restriction>
    </simpleType>
</schema>

Method getSchema() takes too long (more than 15 secs) to retrieve schema (factory.newSchema(schemaFile);). Is there any ways to improve logic? Caching is not solve cause it takes a lot of memory.

    private void validate(Source source) throws SAXException, IOException {
        Schema schema = getSchema();
        if (schema != null) {
            Validator validator = schema.newValidator();
            validator.validate(source);
        }
    }
    public Schema getSchema() {
        String path = getSchemaPath();
        if (path == null) {
            return null;
        }
        ClassLoader loader = getClass().getClassLoader();
        try (BufferedInputStream xsdStream = new BufferedInputStream(loader.getResourceAsStream(path))) {
            SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
            StreamSource schemaFile = new StreamSource(xsdStream);
            schemaFile.setSystemId(loader.getResource(path).toString());
            cachedSchema = factory.newSchema(schemaFile);
            return cachedSchema;
        } catch (SAXException | IOException e) {
            throw new RuntimeException("Error loading schema " + path, e);
        }
    }

Solution

  • As discussed in comments and after analysis of JFR provided, the best solution would be to cache only the Validator instance obtained after parsing the schema

    Hope it'll solve your need.

    Beware of maybe threading issue (didnt find out if Validator instance are thread-safe or not)