Search code examples
javainputstreamzipinputstream

Java InputStream read buffer


Say I'm trying to read from a Java InputStream like this:

ZipInputStream zis = new ZipInputStream(new FileInputStream("C:\\temp\\sample3.zip"));
zis.getNextEntry();
byte[] buffer2 = new byte[2];
int count = zis.read(buffer2));
if(count != -1) //process...
else...//something wrong, abort

I'm parsing a binary file and I set my buffer to 2 in this case because I want to read the next short. I would set my buffer to size 4 if I want to read the next int and so on for other types. The problem is that sometimes zis.read(buffer) won't fill the buffer even when I know that there is enough unread data to fill the buffer. I could simply dump the entire file contents into an array and parse that, but then I end up implementing my own stream reader to do that which seems like re-inventing the wheel. I could also implement a read() function that checks the read count and if less than buffersize, request more data to fill the buffer, but that's inefficient and ugly. Is there a better way to do this?

This is a follow-up question to a question posted here:

Java ZipInputStream extraction errors


Solution

  • Is there a better way to do this?

    Well ... a ZipInputStream ultimately inherits from InputStream so you should be able to wrap it with a BufferedInputStream and then a DataInputStream and read data using readShort, readInt and so on.

    Something like this:

    while (zis.getNextEntry() != null) {
      DataInputStream dis = new DataInputStream(new BufferedInputStream(zis));
      boolean done = false;
      do {
        short s = dis.readShort();
        int i = dis.readInt();
        ...
      } while (!done);
    }
    

    NB: you shouldn't close the dis stream as that would cause the zis to be closed. (Obviously, the zis needs to be closed at an outer level to avoid a resource leak.)

    The BufferedInputStream in the stack ensures that you don't do lots of small reads on the underlying stream ... which would be bad.

    The only possible gotcha is that its methods have particular ideas about how the binary data is represented; e.g. numbers are bigendian. If that is an issue, consider reading the entire zip entry into a byte array, and wrapping it in a ByteBuffer.