When trying to write some UTF8 data to a file, I end up with some garbage in the file. The code is as follows
public static boolean saveToFile(StringBuffer buffer,
String fileName,
ArrayList exceptionList,
String className)
{
log.debug("In saveToFile for file [" + fileName + "]");
RandomAccessFile raf = null;
File file = new File(fileName);
File backupFile = new File(fileName+"_bck");
try
{
if (file.exists())
{
if (backupFile.exists())
{
backupFile.delete();
}
file.renameTo(backupFile);
}
raf = new RandomAccessFile(file, "rw");
raf.writeBytes(buffer.toString());
raf.close();
The output of buffer.toString() is
<?xml version="1.0" encoding="UTF-8"?>
<ivr>
<version>1.1</version>
<templateName>αβγδεζη
The data in the file however is
<?xml version="1.0" encoding="UTF-8"?>
<ivr>
<version>1.1</version>
<templateName>▒▒▒▒▒▒▒</templateName>
How can I make sure that data i nthe file itself is UTF8
I'm not surpised you get garbage:
raf.writeBytes(buffer.toString())
The documentation for RandomAccessFile.writeBytes(String)
says (emphasis added):
Writes the string to the file as a sequence of bytes. Each character in the string is written out, in sequence, by discarding its high eight bits.
In a few circumstances, that operation will result in a correctly encoded file. But in most it won't. That writeBytes()
method is a foolish design by the Java developers. You need to correctly encode your text as bytes in UTF-8, and then write those bytes.
Do you really need to operate on the file as a random access file. If not, just manipulate it with a Writer
wrapping an OutputStream
.
You could use Charset.encode(CharBuffer)
to produce a ByteBuffer
holding the encoded bytes, then write those bytes to the file:
raf.write(StandardCharsets.UTF_8.encode(buffer).array());