I have written the following code which writes 4000 bytes of 0s to a file test.txt
. Then, I read the same file in chunks of 1000 bytes at a time.
FileOutputStream output = new FileOutputStream("test.txt");
ObjectOutputStream stream = new ObjectOutputStream(output);
byte[] bytes = new byte[4000];
stream.write(bytes);
stream.close();
FileInputStream input = new FileInputStream("test.txt");
ObjectInputStream s = new ObjectInputStream(input);
byte[] buffer = new byte[1000];
int read = s.read(buffer);
while (read > 0) {
System.out.println("Read " + read);
read = s.read(buffer);
}
s.close();
What I expect to happen is to read 1000 bytes four times.
Read 1000
Read 1000
Read 1000
Read 1000
However, what actually happens is that I seem to get "paused" (for a lack of a better word) every 1024 bytes.
Read 1000
Read 24
Read 1000
Read 24
Read 1000
Read 24
Read 928
If I try to read more than 1024 bytes, then I get capped at 1024 bytes. If I try to read less than 1024 bytes, I'm still required to pause at the 1024 byte mark.
Upon inspection of the output file test.txt
in hexadecimal, I noticed that there is a sequence of 5 non-zero bytes 7A 00 00 04 00
1029 bytes apart, despite the fact that I have written only 0s to the file. Here is the output from my hex editor. (Would be too long to fit in question.)
So my question is : Why are these five bytes appearing in my file when I have written entirely 0s? Do these 5 bytes have something to do with the pause that occurs every 1024 bytes? Why is this necessary?
The object streams use an internal 1024-byte buffer, and write primitive data in chunks of that size, in blocks of the stream headed by Block Data markers, which are, guess what, 0x7A
followed by a 32-bit length word (or 0x77
followed by an 8-bit length word). So you can only ever read a maximum of 1024 bytes.
The real question here is why you're using object streams just to read and write bytes. Use buffered streams. Then the buffering is under your control, and incidentally there's zero space overhead, unlike the object streams which have stream headers and type codes.
NB serialized data is not text and shouldn't be stored in files named .txt.