import org.jdom2.Document;
import org.jdom2.input.SAXBuilder;
import java.io.FileReader;
public class Test1 {
@org.junit.Test
public void main() throws Exception {
SAXBuilder sax = new SAXBuilder();
Document doc = sax.build(new FileReader("resources/file.xml"));
System.out.println(doc.getRootElement().getText());
}
}
file.xml contains this: <root>©</root>
encoding is UTF-8.
Used libs jdom2-2.06, hamcrest-core-1.3, junit-4.11.
When I run this in IntelliJ output is such: ©
.
When I run this in NetBeans output is such: ©
.
If I put the code to public static void main and run it - everything is ok.
If I change FileReader to FileInputStream - everything is ok.
If I change FileReader to StringReader("<root>©</root>")
- everything is ok.
What can it be?
You are not specifying a character set when reading the file, so it uses the JVM default, afaik running from IntelliJ usually defaults to UTF-8, while Eclipse (at least on Windows) defaults to the default non-unicode character set (eg Cp1252 in Western Europe).
You will need to be explicit, as described in the documentation of FileReader
:
The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.
In other words:
new InputStreamReader(new FileInputStream("resources/file.xml"), StandardCharsets.UTF_8)
Or alternatively, let SAXBuilder
handle this for you and just give it an InputStream
. I believe - but I am not 100 % sure - this will determine character set by the XML declaration:
sax.build(new FileInputStream("resources/file.xml"))