I have this code, I want to read and want to write the "prueba3.xml" at the same time, the file is UTF8 but when I write the file, the encoding changes and displays strange characters, although I have added format.setEncoding("UTF-8")
, it is not doing it correctly. Is it possible to change the output encoding to UTF8 with jdom SAXBuilder
?
Input XML:
<?xml version="1.0" encoding="UTF-8"?>
<prueba>
<reg id="576340">
<dato cant="856" id="6" val="-1" num="" desc="ñápás" />
<dato cant="680" id="1" val="-1" num="" desc="résd" />
<dato cant="684" id="5" val="-1" num="" desc="..да и вообем" />
<dato cant="1621" id="1" val="-1" num="" desc="hi" />
<dato cant="1625" id="5" val="-1" num="" desc="Hola" />
</reg>
</prueba>
This is the code:
public static void main(String[] args) throws FileNotFoundException, JDOMException, IOException
{
//Se crea un SAXBuilder para poder parsear el archivo
File xml = new File("c:\\prueba3.xml");
Document doc = (Document) new SAXBuilder().build(xml);
Element raiz = doc.getRootElement();
//Recorremos los hijos de la etiqueta raíz
List articleRow = raiz.getChildren("reg");
for (int i = 0; i < articleRow.size(); i++) {
Element row = (Element) articleRow.get(i);
List images = row.getChildren("dato");
for (int j = 0; j < images.size(); j++) {
Element row2 = (Element) images.get(j);
String texto = row2.getAttributeValue("desc") ;
String id = row2.getAttributeValue("id");
if ((texto != null) && (texto !="") && (id.equals("1"))){
row2.getAttribute("desc").setValue("Raúl").toString();
}
}
Format format = Format.getRawFormat();
format.setEncoding("UTF-8");
XMLOutputter xmlOutput = new XMLOutputter(format);
xmlOutput = new XMLOutputter(format);
xmlOutput.output(doc, new FileWriter("c:\\prueba3.xml"));
}
System.out.println("fin");
}
Output XML:
<?xml version="1.0" encoding="UTF-8"?>
<prueba>
<reg id="576340">
<dato cant="856" id="6" val="-1" num="" desc="s" />
<dato cant="680" id="1" val="-1" num="" desc="Ra/>
<dato cant="684" id="5" val="-1" num="" desc="..?? ? ??????" />
<dato cant="1621" id="1" val="-1" num="" desc="Ra/>
<dato cant="1625" id="5" val="-1" num="" desc="Hola" />
</reg>
</prueba>
Greetings and thanks for your time.
This is a relatively common problem to encounter when using JDOM - especially in countries/regions with non-latin alphabets. In some senses I regret maintaining the use of Writer
outputs at all in JDOM.
See the JavaDoc on XMLOutputter too: http://www.jdom.org/docs/apidocs/org/jdom2/output/XMLOutputter.html
The issue is that FileWriter
uses the default encoding of the system to convert from the Writer to the underlying byte data. JDOM cannot control that conversion.
If you change the line of code:
xmlOutput.output(doc, new FileWriter("c:\\prueba3.xml"));
to use an OutputStream
instead of a Writer
:
try (OutputStream fos = new FileOutputStream("c:\\prueba3.xml")) {
xmlOutput.output(doc, fos);
}
... it will use the output as a byte-stream, and the systems' default encoding won't interfere with the output.
(P.S. There's no reason to assign the xmlOutput
instance twice.)