Search code examples
javaxmlentitydtddom4j

how to read and print external (unparsed ) general entity Declaration in xml by dom4j


    String xml = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n"
            + "<!DOCTYPE xml [<!ENTITY copy \"&#169;\"> "
            //the next line is missing in output
            + "<!ENTITY logo SYSTEM \"http://www.xmlwriter.net/logo.gif\" NDATA gif>"
            + "<!ENTITY deg \"&#x00b0;\"> ]>\n" + "<root />";

    SAXReader reader = new SAXReader(false);
    reader.setIncludeInternalDTDDeclarations(true);
    reader.setIncludeExternalDTDDeclarations(true);

    Document doc = reader.read(new StringReader(xml));
    StringWriter wr = new StringWriter();
    XMLWriter writer = new XMLWriter(wr);
    writer.write(doc);

    String xml2 = wr.toString();
    System.out.println(xml2);

this is the example.but I fount and here's output

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml [
  <!ENTITY copy "©">
  <!ENTITY deg "°">
]><root/>

you can see, it miss the one line. I did some research. the Entity Declaration is printed is called internal(parsed) entity declaration. and the missing line is called external(unparsed) entity declaration.

Because I want to read xml,change some value and export without losing any data

My question is:

1) where is the problem, the missing data have been read into Document object or the problem happen in the writer, like I miss some configuration.

2) how to fix the problem?


Solution

  • Answers:

    1) Pretty obvious from the source of SAXContentHandler:

    public void unparsedEntityDecl(String name, String publicId,
            String systemId, String notationName) throws SAXException {
        // #### not supported yet!
    }
    

    2) Maybe extending SAXContentHandler, create an UnparsedEntityDecl and set up a custom XMLReader. Probably easier to try another lib, perhaps JDOM2