Search code examples
javaxmlattributesstax

java StAX parser: not preserving double quotes for attributes


StAX parser is converting the double quotes around attributes to single quotes in the data model used by XMLEventReader. This is fine, but If I want to print back the XML, perhaps selecting only a fragment of the original XML, the output will not be the same.

Input file:

<root>
  <mySubTrees>
    <mySubTree>
      <a property="target">
        <aa>123</aa>
      </a>
      <b>456</b>
      <c>789</c>
    </mySubTree>
  </mySubTrees>
</root>

Code:

@Test
public void test_getXmlFragment() throws Exception {
  byte[] fileContent = getXMLBytes();
  String xmlFragment = "";

  XMLInputFactory factory = XMLInputFactory.newInstance();
  XMLEventReader eventReader = factory.createXMLEventReader(new ByteArrayInputStream(fileContent));
  while (eventReader.hasNext()) {
    XMLEvent event = eventReader.nextEvent();
    xmlFragment += event;
  }

  System.out.println(xmlFragment);
}

private byte[] getXMLBytes() throws IOException {
  InputStream inputStream = this.getClass().getResource(PREFIX_XML_FILES + "/sss.xml").openStream();
  byte[] fileContent = new byte[inputStream.available()];
  inputStream.read(fileContent);
  inputStream.close();
  return fileContent;
}

Output:

<?xml version="null" encoding='UTF-8' standalone='no'?>
<root>
    <mySubTrees>
        <mySubTree>
            <a property='target'>
                <aa>123</aa>
            </a>
            <b>456</b>
            <c>789</c>
        </mySubTree>
    </mySubTrees>
</root>

Desired Output:

<?xml version="null" encoding="UTF-8" standalone="no"?>
<root>
    <mySubTrees>
        <mySubTree>
            <a property="target">
                <aa>123</aa>
            </a>
            <b>456</b>
            <c>789</c>
        </mySubTree>
    </mySubTrees>
</root>

Is there any way how to fine-tune this?


Solution

  • No. There is no difference between an attribute wrapped in single quotes or in double quotes, and it is an unreasonable requirement to demand a difference between the two.

    StAX's job is not to preserve the XML file syntax it is reading. StAX is a parser, its job is to relay the data model expressed in the XML it is reading. And it is doing this job perfectly.

    A requirement like yours is likely to force you to write your own XML library, because you shouldn't have this requirement in the first place.