Search code examples
javaxmlutf-8jdom

Java xml error encoding utf-8


Well When I write it shows strange characters, I have been reading and I have to use FileOutputStream to solve the problem, but I am very new and I do not know how to do it. My code is wrong, there is an error doing, build (xml) and I do not know if I would write the output file in this way.

<?xml version="1.0" encoding="UTF-8"?>
 <prueba>
     <reg id="576340">
           <dato cant="680" id="1" val="-1" num="" desc="résd" />
           <dato cant="684" id="5" val="-1" num="" desc="да и вообще" /> 
           <dato cant="1621" id="1" val="-1" num="" desc="Hi" />
           <dato cant="1625" id="5" val="-1" num="" desc="Hola" />  
     </reg>
 </prueba>


public static void main(String[] args) throws FileNotFoundException, 
     JDOMException, IOException {

SAXBuilder builder = new SAXBuilder();
File xml = new File("c:\\prueba3.xml");
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(xml), "UTF8"));
Document doc = (Document) new SAXBuilder().build(xml);
Element raiz = doc.getRootElement();
List articleRow = raiz.getChildren("reg"); 

for (int i = 0; i < articleRow.size(); i++) {

    Element row = (Element) articleRow.get(i);
    List images = row.getChildren("dato");

     for (int j = 0; j < images.size(); j++) {

         Element row2 = (Element) images.get(j);
         String texto = row2.getAttributeValue("desc") ;
         String id = row2.getAttributeValue("id"); 

         if ((texto != null) && (texto !="") && 
            (id.equals("1") || id.equals("2"))){                   

         //row2.getChild("desc").setText("valor");   
         out.append(row2.getAttribute("desc").setValue.
                   ("raúl").toString());
         }
     }
}
 out.flush();
 out.close();
 System.out.println("fin de programa");  
}

These are the output data

<?xml version="1.0" encoding="UTF-8"?>
 <prueba>
    <reg id="576340">
           <dato cant="680" id="1" val="-1" num="" desc="ra򬢠/>
           <dato cant="684" id="5" val="-1" num="" desc="..?? ? ??????/>
           <dato cant="1621" id="1" val="-1" num="" desc="ra򬢠/>
           <dato cant="1625" id="5" val="-1" num="" desc="Hola" />
    </reg>
  </prueba>  

Log Error

Exception in thread "main" org.jdom.input.JDOMParseException: Error on line 1 of document file:/c:/prueba3.xml: Final de archivo prematuro.
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:530)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:905)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:884)
at Prueba.main(Prueba.java:27)Caused by: org.xml.sax.SAXParseException; systemId: file:/c:/prueba3.xml; lineNumber: 1; columnNumber: 1; Final de archivo prematuro.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:518)
... 3 moreCaused by: org.xml.sax.SAXParseException; systemId: file:/c:/prueba3.xml; lineNumber: 1; columnNumber: 1; Final de archivo prematuro.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:518)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:905)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:884)
at Prueba.main(Prueba.java:27)

I would appreciate your help.


Solution

  • Depending of the target encoding you have to decide how this will be written to the filesystem. You decided to write with 'UTF8'.

    Writer out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(xml), "UTF8"));

    You have to make sure that the program which loads the date knows it is encoded in UTF-8. E.g. notepad++ allows to choose a different encoding than the system default. In most cases UTF-8 is not system default. so you have to give the information during loading of the files.

    Please also check Java FileReader encoding issue