Search code examples
file-iojava-iofileinputstreamfileoutputstream

Subtleties in Reading/ Writing Binary Data with Java


One phenomenon I've noticed with Java file reads using a byte array buffer, is that just like C's fread(), if I don't dynamically control the length of the final read, and the total size of the data being read is not a multiple of the buffer size, then excess garbage data could be read into the file. When performing binary I/O, some copied files would be rendered somewhat corrupted.

The garbage values could possibly be values previously stored in the buffer that were not overwritten since the final read was not of full buffer length. While looking over various tutorials, all methods of reading binary data was similar to the code below:

InputStream inputStream = new FileInputStream("prev_font.ttf");;
OutputStream outputStream = new FileOutputStream("font.ttf");
byte buffer[] = new byte[512];
while((read = inputStream.read(buffer)) != -1)
{
    outputStream.write(buffer, 0, read);
}
outputStream.close();
inputStream.close();

But while reading from an input stream from a file packaged in a JAR, I couldn't make a copy of the file properly. I would output as an invalid file of that type.

Since I was quite new to JAR access, I could not pinpoint whether the issue was with my resource file pathing or something else. So it took quite a bit of time to realize what was going on. All codes I came across had a vital missing portion. The read amount should not be the entire buffer, but only the amount that is read:

InputStream inputStream = new FileInputStream("prev_font.ttf");
OutputStream outputStream = new FileOutputStream(font.ttf");
byte dataBuffer[] = new byte[512];
int read;
while((read = inputStream.read(dataBuffer)) != -1)
{
    outputStream.write(dataBuffer, 0, read);
}
outputStream.close();
inputStream.close();

Now that's all fine now, but why was something so major not mentioned in any of the tutorials? Did I simply look at bad tutorials, or was Java supposed to handle the oveflow reads and my implementation was off somehow? It was simply unexpected.

Please correct me if any of my statements were wrong, and kindly provide alternative solutions to handling the issue if there are any.


Solution

  • There isn't much difference between the code blocks you've provided except for minor typos which mean that they won't compile. The buffer is not corrupted by read, but the output file is corrupted if the number of bytes read is not provided to the writer for each iteration of the loop.

    To copy a file - say src -> dst just use try with resources and the built in transferTo:

    Path src = Path.of("prev_font.ttf");
    Path dst = Path.of("font.ttf");
    try(InputStream in  = Files.newInputStream(src);
       OutputStream out = Files.newOutputStream(dst)) {
        in.transferTo(out);
    }
    

    Or call one of the built in methods of Files:

    Files.copy(src, dst);
    // or
    Files.copy(src, dst, StandardCopyOption.REPLACE_EXISTING);