Search code examples
javaxmlencodingstax

encoding in Java StAX parser


I'm using StAX to read XML file, but having problem with characters like žćčšđ. The code is almost same as in the SAX, but i had not that kind of problem with that.

this is part of xml document

<?xml version = "1.0" encoding="UTF-8" ?>      
<Autor>
        <Id>1</Id>
        <Meno>Jano Žiška</Meno>
        <Email>[email protected]</Email>
        <tel_cislo typ="mobil">0944564685</tel_cislo>  
        <plat>500</plat>
      </Autor>

java

        public static void main(String[] args) {
            try {
              XMLInputFactory f = XMLInputFactory.newInstance();
              XMLStreamReader r = f.createXMLStreamReader(new FileReader(SUBOR));
            }
....
          if (r.getLocalName().equals(ELEMENT_MENO) == true) {
            String v = r.getElementText();
             System.out.println("meno:\t\t\t " + v);
          }

how can i specify encoding in java? thanks


Solution

  • Unless you have a really good reason, you should always use binary streams with XML (InputStream/OutputStream), not character streams (Reader/Writer). using character streams risks corrupting the xml (as the OP's original code shows).

    XMLStreamReader r = f.createXMLStreamReader(new FileInputStream( SUBOR ));