I'm currently trying to parse the Japanese JMdict xml document and it declares a bunch of ENTITY
references that are used throughout the document.
Like this bit here:
<!ENTITY MA "martial arts term">
<!ENTITY X "rude or X-rated term (not displayed in educational software)">
<!ENTITY abbr "abbreviation">
<!ENTITY adj-i "adjective (keiyoushi)">
<!ENTITY adj-ix "adjective (keiyoushi) - yoi/ii class">
There are then referenced in the xml like so <field>&MA;</field>
XStream does not like this and demands that I fix this and then promptly throws a ConversionException
and quits.
Is there a way to automatically recognize these entities and swap them out?
I'd prefer not having to write 170 lines of xml = xml.replace(one, other);
I'm just using XPP3 and then annotations to create POJOs from the data to begin with. No custom parser.
Since you say you're using XPP3, I assume that you are creating your XStream object like this:
XStream xstream = new XStream(); //uses XPP3
The problem is that XPP3 apparently does not resolve entities out of the box:
...it is user responsibility to resolve entity reference.
So unless you want to implement entity resolution, you need to use a parser that does resolve entities. If you want to stay with a pull parser, you can use StAX like this:
XStream xstream = new XStream(new StaxDriver());
Alternatively you could use DOM (not a pull parser; loads the entire document into memory):
XStream xstream = new XStream(new DomDriver());