Search code examples
javasecurityxsdjaxbunmarshalling

Demonstrate that JAXB unmarshalling will not load an XSD schema


Related to this question, given the following schema called customer.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:simpleType name="stringMaxSize5">
        <xs:restriction base="xs:string">
            <xs:maxLength value="5"/>
        </xs:restriction>
    </xs:simpleType>
    <xs:element name="customer">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="name" type="stringMaxSize5"/>
                <xs:element ref="phone-number" maxOccurs="2"/>
             </xs:sequence>
        </xs:complexType>
    </xs:element>
    <xs:element name="phone-number">
        <xs:complexType>
            <xs:sequence/>
        </xs:complexType>
    </xs:element>
</xs:schema>

The following XML document called input.xml:

<customer
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="./customer.xsd">
    <name>Jane Doe</name>
    <phone-number/>
    <phone-number/>
    <phone-number/>
</customer>

And the following unmarshalling code:

import java.io.File;
import java.util.ArrayList;
import java.util.List;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.Unmarshaller;
import javax.xml.bind.annotation.XmlRootElement;

public class Unmarshal {

    @XmlRootElement(name = "customer")
    public static class Customer {
        public String name;
        @XmlElement(name="phone-number")
        public List<PhoneNumber> phoneNumbers = new ArrayList<PhoneNumber>();
    }

    public static class PhoneNumber {}

    public static void main(String[] args) throws Exception {
        JAXBContext jc = JAXBContext.newInstance(Customer.class);
        Unmarshaller unmarshaller = jc.createUnmarshaller();
        unmarshaller.setSchema(null);
        Customer customer = (Customer) unmarshaller.unmarshal(new File("input.xml"));
        System.out.println(customer.name);
    }

}

The Java code is able to deserialize the XML input document into an instance of Customer even when this XML document produces 2 validation errors (indicated by an external editor):

cvc-maxLength-valid: Value 'Jane Doe' with length = '8' is not facet-valid with respect to maxLength '5' for type 'stringMaxSize5'.xml(cvc-maxLength-valid)
cvc-type.3.1.3: The value 'Jane Doe' of element 'name' is not valid.xml(cvc-type.3.1.3)

and

cvc-complex-type.2.4.f: 'phone-number' can occur a maximum of '2' times in the current sequence. This limit was exceeded. No child element is expected at this point.xml(cvc-complex-type.2.4.f)

This means that JAXB did not validate the given XML input during unmarshalling, however:

Given that unmarshaller.setSchema(null); was set to DISABLE schema validation, is there a way to demonstrate that the content of the customer.xsd file WAS NOT accessed by the JVM when unmarshalling occurred?

In other words, is there a way to not blindly trust that the JVM won't load XSD references, even when schema validation is explicitly set to null?

Update 1:

The purpose is to find out how likely are XSD schema references inside an XML document likely to become a security attack vector, as described by:

Thanks.


Solution

  • I don't truly understand the need to check this but here's a simple use-case that should give the proof that the XSD file won't be loaded by the JVM when unmarshalling XML :

    • Create the Java Classes from XSD by any plugin you want, or like here, create it manually. Package it in jar file without the XSD in it (only Java classes)
    • Write the unmarshalling code from XML in another Java program, that depends on the first JAR and run it with no XSD whatsoever in your classpath ==> Your code will still work without knowing the XSD Schema

    You can also look at the Java Code itself :

    • com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl which implements the Unmarshaller interface has default schema property to null (so setting it to null is a no-op in your above code)
    • Your code doesn't reference XSD itself from which you created the Java XML-annotated-POJO : the JVM cannot guess that you write your code based on external schema.
    • You could also run your program in debug mode and see that the XSD won't be loaded (again there's no reference in your Java Code : you could even have deleted it, it wouldn't have made a difference).

    Still, I Hope that I've answered to your question.