I am trying to read a file which contains some japanese characters.
RandomAccessFile file = new RandomAccessFile("japanese.txt", "r");
String line;
while ((line = file.readLine()) != null) {
System.out.println(line);
}
Its returning some garbled characters instead of japanese. But when I am converting the encoding, it printing it properly.
line = new String(line.getBytes("ISO-8859-1"), "UTF-8");
What does this mean? Is the text file in ISO-8859-1 encoding?
$ file -i japanese.txt
returns following:
japanese.txt: text/plain; charset=utf-8
Please explain which it explicitely requires the file to convert from Latin 1 to UTF-8?
No, readString
is an obsolete method, still before charsets/encodings and such. It turns every byte into a char with high byte 0. Byte 0x85 is a line separator (EBCDIC NEL), and if that were in some UTF-8 multibyte sequence, the actual line would be broken into two lines. And some more scenarios are feasible.
Best use Files
. It has a newBufferedReader(path, Charset)
and a fixed default charset UTF-8.
Path path = Paths.get("japanese.txt");
try (BufferedReader file = Files.newBufferedReader(path)) {
String line;
while ((line = file.readLine()) != null) {
System.out.println(line);
}
}
Now you'll read correct Strings.
A RandomAccessFile basically is for binary data.