Search code examples
javafile-iobinaryfiles

Java FileInputStream.read() skips bytes


I am reading a binary file one byte at a time using Java's FileInputStream.read() and keeping track of the position in the file by incrementing a variable i. I am looking for a specific character, and for the first part of the binary file, the correct offset is returned.

However, later in the file, the offsets (as found by i) start decreasing from the actual offsets in the file. (For example, a character at 0x4c5 was misread to be at 0x4c3.) As such, it appears FileInputStream.read() skips bytes. Progressively, i became significantly less than the actual file offset (by the end, it was 60 bytes less).

Here is some of my code.

in = new FileReader(path);
int c = 0;
int i = -1;

while (c != -1) {
    i++;
    try {
        c = in.read();
        if (c == 0x47) {
            print("Found G at 0x" + Integer.toHexString(i));
        }
    } catch(IOException e) ...

What could be causing this? Furthermore, how can this be avoided?


Solution

  • I think that the problem is that you are actually reading from a Reader, not an InputStream. Certainly, that is what you are doing in the code you showed us!

    A Reader.read() call will consume one or more bytes1 and return a single char that is represented by those bytes.

    Solution: Don't use a Reader to read a binary file. Use an InputStream or some subclass of InputStream.


    1 - The actual behavior depends on the character encoding that FileReader uses. For example, if the encoding is UTF-8, then bytes that are greater than 0x7f are treated as part of a multi-type character. If you read arbitrary binary data as if it was UTF-8 encoded text, the result is liable to be garbage. Certainly, I would expect the offsets to be "off".