Search code examples
xmlkotlinpathxsdclasspath

How to lazy load multi-file XSD from a ZIP on runtime using Kotlin?


A requirement emerged in which an xsd.zip file containing many schemas needs to be loaded at runtime. This ZIP file will be available on classpath, on top of that, it contains many schemas with xsd:import directives pointing to other schema files via relative path context.

# Visual of the uncompressed file: 
➜ tree -L 3 app/src/test/resources     
app/src/test/resources
├── xsd
│  └── Schemas
│     ├── A
│     ├── B
│     ├── C
│     └── ...
└── xsd.zip

In my service, I have an enum that I want to associate a schema per enumeration:

import javax.xml.transform.stream.StreamSource
import javax.xml.validation.Schema
import org.xml.sax.SAXException

enum class XmlSchemaDefinition(
    path: String,
) {
    A("Schemas/A.xsd"),
    B("Schemas/B.xsd"),
    ;

    @Throws(SAXException::class)
    fun validate(xml: String) = schema
        .newValidator()
        .validate(StreamSource(xml.byteInputStream()))
}

As you can see, my objective/attempt here is to load the schema once, and for each validation call, a new validator is created (as it's not thread-safe). However, whenever I try to load my schema via:

private val schema: Schema = run {
    val zipResourceUri: URI = Thread.currentThread()
        .contextClassLoader
        .getResource("xsd.zip")
        ?.toURI()
        ?: error("ZIP resource not found on classpath: xsd.zip")

    val zipFile = ZipFile(Paths.get(zipResourceUri).toFile())

    SchemaFactory
        .newInstance(W3C_XML_SCHEMA_NS_URI)
        // Load the schema from the ZIP entry's input stream
        .newSchema(StreamSource(zipFile.getInputStream(zipFile.getEntry(path))))
}

I get:

Caused by: org.xml.sax.SAXParseException; lineNumber: 307; columnNumber: 34; src-resolve: Cannot resolve the name 'XXX:XXXXX' to a(n) 'element declaration' component.

Which after further investigation it turns out it's unable to resolve xsd:import directives that is required by the element 'XXX:XXXXX'. How can I load my XSD from a ZIP in a lazy manner whilst still accommodating for relative xsd:import directives, in Kotlin?


Solution

  • Solution

    import java.net.URI
    import javax.xml.XMLConstants.W3C_XML_SCHEMA_NS_URI
    import javax.xml.transform.stream.StreamSource
    import javax.xml.validation.Schema
    import javax.xml.validation.SchemaFactory
    import org.apache.logging.log4j.LogManager
    import org.apache.logging.log4j.core.Logger
    import org.xml.sax.SAXException
    
    enum class XmlSchemaDefinition(
        path: String,
    ) {
        A("Schemas/A.xsd"),
        B("Schemas/B.xsd"),
        ;
    
        private val schema: Schema by lazy {
            SchemaFactory
                .newInstance(W3C_XML_SCHEMA_NS_URI)
                .apply { LOGGER.debug("Creating new schema for: {}", path) }
                .newSchema(URI("jar:$ZIP_RESOURCE_URI!/$path").toURL())
                .apply { LOGGER.debug("Loaded XSD schema from ZIP, hashcode: {}", this.hashCode()) }
        }
    
        @Throws(SAXException::class)
        fun validate(xml: String) = schema
            .newValidator()
            .validate(StreamSource(xml.byteInputStream()))
    
        companion object {
            private const val XSD_ZIP_PATH: String = "xsd.zip"
    
            private val LOGGER = LogManager
                .getLogger(XmlSchemaDefinition::class.java) as Logger
    
            private val ZIP_RESOURCE_URI = Thread
                .currentThread().contextClassLoader
                .getResource(XSD_ZIP_PATH)
                ?: error("ZIP resource not found on classpath: $XSD_ZIP_PATH")
        }
    }
    

    Explanation

    Bulk of the issue comes from using StreamSource as you will lose context by using a byte stream, which can be problematic when your schema has directives (e.g. xsd:import) pointing to other schema files. One potential solution is loading the schema via an actual URL.

    On top of that, we can use Kotlin's lazy() delegate which remembers* the result after the first execution. (* It's worth noting that if the initialization of a lazy value throws an exception, it will attempt to reinitialize the value at next access.)

    For convenience, here are some recipes for zipping your test files:

    zip-xsd-files-in-test-resources:
        @echo "Zipping XSD files in test resources"
        @cd app/src/test/resources/xsd && zip -r ../xsd.zip .
    
    unzip-xsd-files-in-test-resources:
        @echo "Unzipping XSD files in test resources"
        @cd app/src/test/resources && unzip -o xsd.zip -d xsd
    
    If you're wondering how is the jar: scheme is able to load the ZIP file:

    JAR file is a file format based on the popular ZIP file format and is used for aggregating many files into one. A JAR file is essentially a zip file that contains an optional META-INF directory.