Search code examples
javaxml-parsingxstreamxml-deserialization

Parsing XML with references to previous tags, and with children corresponding to subtypes of some class


I have to deal with (a variation of) the following scenario. My model classes are:

class Car {
    String brand;
    Engine engine;
}

abstract class Engine {
}

class V12Engine extends Engine {
    int horsePowers;
}

class V6Engine extends Engine {
    String fuelType;
}

And I have to deserialize (no need for serialization support ATM) the following input:

<list>

    <brand id="1">
        Volvo
    </brand>

    <car>
        <brand>BMW</brand>
        <v12engine horsePowers="300" />
    </car>

    <car>
        <brand refId="1" />
        <v6engine fuel="unleaded" />
    </car>

</list>

What I've tried / issues:

I've tried using XStream, but it expects me to write tags such as:

<engine class="cars.V12Engine">
    <horsePowers>300</horsePowers>
</engine>

etc. (I don't want an <engine>-tag, I want a <v6engine>-tag or a <v12engine>-tag.

Also, I need to be able to refer back to "predefined" brands based on identifiers, as shown with the brand-id above. (For instance by maintaining a Map<Integer, String> predefinedBrands during the deserialization). I don't know if XStream is well suited for such scenario.

I realize that this could be done "manually" with a push or pull parser (such as SAX or StAX) or a DOM-library. I would however prefer to have some more automation. Ideally, I should be able to add classes (such as new Engines) and start using them in the XML right away. (XStream is by no means a requirement, the most elegant solutions wins the bounty.)


Solution

  • JAXB (javax.xml.bind) can do everything you're after, though some bits are easier than others. For the sake of simplicity I'm going to assume that all your XML files have a namespace - it's trickier if they don't but can be worked around using the StAX APIs.

    <list xmlns="http://example.com/cars">
    
        <brand id="1">
            Volvo
        </brand>
    
        <car>
            <brand>BMW</brand>
            <v12engine horsePowers="300" />
        </car>
    
        <car>
            <brand refId="1" />
            <v6engine fuel="unleaded" />
        </car>
    
    </list>
    

    and assume a corresponding package-info.java of

    @XmlSchema(namespace = "http://example.com/cars",
               elementFormDefault = XmlNsForm.QUALIFIED)
    package cars;
    import javax.xml.bind.annotation.*;
    

    Engine type by element name

    This is simple, using @XmlElementRef:

    package cars;
    import javax.xml.bind.annotation.*;
    
    @XmlRootElement
    @XmlAccessorType(XmlAccessType.FIELD)
    public class Car {
        String brand;
        @XmlElementRef
        Engine engine;
    }
    
    @XmlRootElement
    abstract class Engine {
    }
    
    @XmlRootElement(name = "v12engine")
    @XmlAccessorType(XmlAccessType.FIELD)
    class V12Engine extends Engine {
        @XmlAttribute
        int horsePowers;
    }
    
    @XmlRootElement(name = "v6engine")
    @XmlAccessorType(XmlAccessType.FIELD)
    class V6Engine extends Engine {
        // override the default attribute name, which would be fuelType
        @XmlAttribute(name = "fuel")
        String fuelType;
    }
    

    The various types of Engine are all annotated @XmlRootElement and marked with appropriate element names. At unmarshalling time the element name found in the XML is used to decide which of the Engine subclasses to use. So given XML of

    <car xmlns="http://example.com/cars">
        <brand>BMW</brand>
        <v12engine horsePowers="300" />
    </car>
    

    and unmarshalling code

    JAXBContext ctx = JAXBContext.newInstance(Car.class, V6Engine.class, V12Engine.class);
    Unmarshaller um = ctx.createUnmarshaller();
    Car c = (Car)um.unmarshal(new File("file.xml"));
    
    assert "BMW".equals(c.brand);
    assert c.engine instanceof V12Engine;
    assert ((V12Engine)c.engine).horsePowers == 300;
    

    To add a new type of Engine simply create the new subclass, annotate it with @XmlRootElement as appropriate, and add this new class to the list passed to JAXBContext.newInstance().

    Cross-references for brands

    JAXB has a cross-referencing mechanism based on @XmlID and @XmlIDREF but these require that the ID attribute be a valid XML ID, i.e. an XML name, and in particular not entirely consisting of digits. But it's not too difficult to keep track of the cross references yourself, as long as you don't require "forward" references (i.e. a <car> that refers to a <brand> that has not yet been "declared").

    The first step is to define a JAXB class to represent the <brand>

    package cars;
    
    import javax.xml.bind.annotation.*;
    
    @XmlRootElement
    public class Brand {
      @XmlValue // i.e. the simple content of the <brand> element
      String name;
    
      // optional id and refId attributes (optional because they're
      // Integer rather than int)
      @XmlAttribute
      Integer id;
    
      @XmlAttribute
      Integer refId;
    }
    

    Now we need a "type adapter" to convert between the Brand object and the String required by Car, and to maintain the id/ref mapping

    package cars;
    
    import javax.xml.bind.annotation.adapters.*;
    import java.util.*;
    
    public class BrandAdapter extends XmlAdapter<Brand, String> {
      private Map<Integer, Brand> brandCache = new HashMap<Integer, Brand>();
    
      public Brand marshal(String s) {
        return null;
      }
    
    
      public String unmarshal(Brand b) {
        if(b.id != null) {
          // this is a <brand id="..."> - cache it
          brandCache.put(b.id, b);
        }
        if(b.refId != null) {
          // this is a <brand refId="..."> - pull it from the cache
          b = brandCache.get(b.refId);
        }
    
        // and extract the name
        return (b.name == null) ? null : b.name.trim();
      }
    }
    

    We link the adapter to the brand field of Car using another annotation:

    @XmlRootElement
    @XmlAccessorType(XmlAccessType.FIELD)
    public class Car {
        @XmlJavaTypeAdapter(BrandAdapter.class)
        String brand;
        @XmlElementRef
        Engine engine;
    }
    

    The final part of the puzzle is to ensure that <brand> elements found at the top level get saved in the cache. Here is a complete example

    package cars;
    
    import javax.xml.bind.*;
    import java.io.File;
    import java.util.*;
    
    import javax.xml.stream.*;
    import javax.xml.transform.stream.StreamSource;
    
    public class Main {
      public static void main(String[] argv) throws Exception {
        List<Car> cars = new ArayList<Car>();
    
        JAXBContext ctx = JAXBContext.newInstance(Car.class, V12Engine.class, V6Engine.class, Brand.class);
        Unmarshaller um = ctx.createUnmarshaller();
    
        // create an adapter, and register it with the unmarshaller
        BrandAdapter ba = new BrandAdapter();
        um.setAdapter(BrandAdapter.class, ba);
    
        // create a StAX XMLStreamReader to read the XML file
        XMLInputFactory xif = XMLInputFactory.newFactory();
        XMLStreamReader xsr = xif.createXMLStreamReader(new StreamSource(new File("file.xml")));
    
        xsr.nextTag(); // root <list> element
        xsr.nextTag(); // first <brand> or <car> child
    
        // read each <brand>/<car> in turn
        while(xsr.getEventType() == XMLStreamConstants.START_ELEMENT) {
          Object obj = um.unmarshal(xsr);
    
          // unmarshal from an XMLStreamReader leaves the reader pointing at
          // the event *after* the closing tag of the element we read.  If there
          // was a text node between the closing tag of this element and the opening
          // tag of the next then we will need to skip it.
          if(xsr.getEventType() != XMLStreamConstants.START_ELEMENT && xsr.getEventType() != XMLStreamConstants.END_ELEMENT) xsr.nextTag();
    
          if(obj instanceof Brand) {
            // top-level <brand> - hand it to the BrandAdapter so it can be
            // cached if necessary
            ba.unmarshal((Brand)obj);
          }
          if(obj instanceof Car) {
            cars.add((Car)obj);
          }
        }
        xsr.close();
    
        // at this point, cars contains all the Car objects we found, with
        // any <brand> refIds resolved.
      }
    }