Search code examples
javastax

Using StaX how to read UTF-8 data with & characters?


How can I read with Stax all characters from a tag text, even the &? I don't have influence on the incoming XML file.

An example XML file is:

<?xml version="1.0" encoding="UTF-8"?>
<Employees>
    <Employee id="1">
        <age>22</age>
        <name>MyName &amp; Team 01/46</name>
        <gender>Female</gender>
        <role>Java Developer</role>
    </Employee>
    ....
</Employees>

Via a number of attempts, from the name only the "MyName" part is read.

Attempt 1:

Path gpxPath = Paths.get( path);
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLStreamReader reader;
reader = xmlInputFactory.createXMLStreamReader( new FileInputStream(gpxPath.toFile()), "UTF-8");
... 
String name = reader.getText();

Attempt 2:

XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
try {
    XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader( 
          new DataInputStream(new FileInputStream(fileName)), "UTF-8");
    ... 
    name = new String( xmlStreamReader.getTextCharacters());
    // or ... 
    name = xmlStreamReader.getText();

How to read the complete name? So, "MyName & Team 01/46".


Solution

  • The solution was to set a property on the Xml factory:

    XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
    xmlInputFactory.setProperty( IS_COALESCING, true);