Search code examples
javarandomaccessfile

Unexpected output with RandomAccessFile


I'm trying to learn about RandomAccessFile but after creating a test program I'm getting some bizarre output.

import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;

public class RandomAccessFileTest
{
    public static void main(String[] args) throws IOException
    {
        // Create a new blank file
        File file = new File("RandomAccessFileTest.txt");
        file.createNewFile();
        
        // Open the file in read/write mode
        RandomAccessFile randomfile = new RandomAccessFile(file, "rw");
        
        // Write stuff
        randomfile.write("Hello World".getBytes());
        
        // Go to a location
        randomfile.seek(0);
        
        // Get the pointer to that location
        long pointer = randomfile.getFilePointer();
        System.out.println("location: " + pointer);
        
        // Read a char (two bytes?)
        char letter = randomfile.readChar();
        System.out.println("character: " + letter);
        
        randomfile.close();
    }
}

This program prints out

location: 0

character: ?

Turns out that the value of letter was '䡥' when it should be 'H'.

I've found a question similar to this, and apparently this is caused by reading one byte instead of two, but it didn't explain how exactly to fix it.


Solution

  • You've written "Hello World" in the platform default encoding - which is likely to use a single byte per character.

    You're then reading RandomAccessFile.readChar which always reads two bytes. Documentation:

    Reads a character from this file. This method reads two bytes from the file, starting at the current file pointer. If the bytes read, in order, are b1 and b2, where 0 <= b1, b2 <= 255, then the result is equal to:

       (char)((b1 << 8) | b2)
    

    This method blocks until the two bytes are read, the end of the stream is detected, or an exception is thrown.

    So H and e are being combined into a single character - H is U+0048, e is U+0065, so assuming they've been written as ASCII character, you're reading bytes 0x48 and 0x65 and combining them into U+4865 which is a Han character for "a moving cart".

    Basically, you shouldn't be using readChar to try to read this data.

    Usually to read a text file, you want an InputStreamReader (with an appropriate encoding) wrapping an InputStream (e.g. a FileInputStream). It's not really ideal to try to do this with RandomAccessFile - you could read data into a byte[] and then convert that into a String but there are all kinds of subtleties you'd need to think about.