I am reading a binary file one byte at a time using Java's FileInputStream.read() and keeping track of the position in the file by incrementing a variable i
. I am looking for a specific character, and for the first part of the binary file, the correct offset is returned.
However, later in the file, the offsets (as found by i
) start decreasing from the actual offsets in the file. (For example, a character at 0x4c5
was misread to be at 0x4c3
.) As such, it appears FileInputStream.read() skips bytes. Progressively, i
became significantly less than the actual file offset (by the end, it was 60 bytes less).
Here is some of my code.
in = new FileReader(path);
int c = 0;
int i = -1;
while (c != -1) {
i++;
try {
c = in.read();
if (c == 0x47) {
print("Found G at 0x" + Integer.toHexString(i));
}
} catch(IOException e) ...
What could be causing this? Furthermore, how can this be avoided?
I think that the problem is that you are actually reading from a Reader
, not an InputStream
. Certainly, that is what you are doing in the code you showed us!
A Reader.read()
call will consume one or more bytes1 and return a single char
that is represented by those bytes.
Solution: Don't use a Reader
to read a binary file. Use an InputStream
or some subclass of InputStream
.
1 - The actual behavior depends on the character encoding that FileReader
uses. For example, if the encoding is UTF-8, then bytes that are greater than 0x7f are treated as part of a multi-type character. If you read arbitrary binary data as if it was UTF-8 encoded text, the result is liable to be garbage. Certainly, I would expect the offsets to be "off".