Search code examples
xmljaxbxsdunmarshallingcdata

JAXB Unmarshaling XML Retains CDATA


I am trying to unmarshal XML that contains CDATA elements. The strings I get back still have the CDATA "wrappers." I have used XJC to create the Java classes from the XSD and they are in the jmish.jaxb package. I'm using the JAXB that is included in Oracle's (Sun's) Java 7 JDK.

The section of the XSD that defines the Product element is:

<xs:element name="Product" minOccurs="0" maxOccurs="unbounded">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Specifications" minOccurs="0" maxOccurs="1" />
      <xs:element name="Description" type="xs:string" minOccurs="1" maxOccurs="1" msdata:Ordinal="1" />
    </xs:sequence>
    <xs:attribute name="name" type="xs:string" />
    <xs:attribute name="imageFile" type="xs:string" />
  </xs:complexType>
</xs:element>

A snippet of the XML is:

<Product name="Allure_444" imageFile="Allure_444_Ivory.jpg">
    <Description>![CDATA[444 Ivory]]</Description>
</Product>

And the unmarshaling code is:

JAXBContext jc = JAXBContext.newInstance( "jmish.jaxb" );
Unmarshaller u = jc.createUnmarshaller();
Catalog catalog = (Catalog)u.unmarshal( new FileInputStream( "bin/ProductCatalog.xml" ) );

After unmarshaling (and navigating my down to any of the Product nodes) if I call product.getDescription(), I get:

[CDATA[444 Ivory]]

not:

444 Ivory

If the CDATA contains any character entities, they are correctly replaced (so any &lt; becomes <).

Why do the CDATA wrappers persist? In every example I've seen on this site and others, they are removed during unmarshaling. This has to be a simple problem, but I'm just not seeing it.


Solution

  • <Product name="Allure_444" imageFile="Allure_444_Ivory.jpg">
        <Description>![CDATA[444 Ivory]]</Description>
    </Product>
    

    That is not a valid CDATA wrapper. It should look like this:

    <Product name="Allure_444" imageFile="Allure_444_Ivory.jpg">
        <Description><![CDATA[444 Ivory]]></Description>
    </Product>
    

    You need to fix whatever is generating the XML to provide the correct syntax.