I am trying to unmarshal XML that contains CDATA elements. The strings I get back still have the CDATA "wrappers." I have used XJC to create the Java classes from the XSD and they are in the jmish.jaxb
package. I'm using the JAXB that is included in Oracle's (Sun's) Java 7 JDK.
The section of the XSD that defines the Product
element is:
<xs:element name="Product" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element ref="Specifications" minOccurs="0" maxOccurs="1" />
<xs:element name="Description" type="xs:string" minOccurs="1" maxOccurs="1" msdata:Ordinal="1" />
</xs:sequence>
<xs:attribute name="name" type="xs:string" />
<xs:attribute name="imageFile" type="xs:string" />
</xs:complexType>
</xs:element>
A snippet of the XML is:
<Product name="Allure_444" imageFile="Allure_444_Ivory.jpg">
<Description>![CDATA[444 Ivory]]</Description>
</Product>
And the unmarshaling code is:
JAXBContext jc = JAXBContext.newInstance( "jmish.jaxb" );
Unmarshaller u = jc.createUnmarshaller();
Catalog catalog = (Catalog)u.unmarshal( new FileInputStream( "bin/ProductCatalog.xml" ) );
After unmarshaling (and navigating my down to any of the Product
nodes) if I call product.getDescription()
, I get:
[CDATA[444 Ivory]]
not:
444 Ivory
If the CDATA contains any character entities, they are correctly replaced (so any <
becomes <
).
Why do the CDATA wrappers persist? In every example I've seen on this site and others, they are removed during unmarshaling. This has to be a simple problem, but I'm just not seeing it.
<Product name="Allure_444" imageFile="Allure_444_Ivory.jpg">
<Description>![CDATA[444 Ivory]]</Description>
</Product>
That is not a valid CDATA wrapper. It should look like this:
<Product name="Allure_444" imageFile="Allure_444_Ivory.jpg">
<Description><![CDATA[444 Ivory]]></Description>
</Product>
You need to fix whatever is generating the XML to provide the correct syntax.