Using CipherInputStream with Sockets, finding size of unencrypted file

I'm sending files between two machines using the common practice of sending the dimension of the file first, in bytes, and then having the other side read the stream until it has received the exact amount of bytes, which it writes into a BufferedFileOutputStream. Something like this (receiving-end):

long dimension = (long) inStream.readObject();
BufferedOutputStream receivedFileBuffer = new BufferedOutputStream(new FileOutputStream("receivedFile"));

byte[] buffer = new byte[1024];

long count = 0;
int bytesRead;
while (count < dimension) {
    bytesRead = inStream.read(buffer, 0, (int) Math.min(dimension - count, 1024));

    receivedFileBuffer.write(buffer, 0, bytesRead);

    count += bytesRead;

}

receivedFileBuffer.flush();
receivedFileBuffer.close();

However, the file I'm sending is actually encrypted with AES, and on the sending-end I'm reading the data through a CipherInputStream which decrypts it before sending it through the socket, such as:

// Previously created AES Cipher, BufferedFileInputStream, etc
CipherInputStream decryptionStream = new CipherInputStream(fileInputStream, decryptionCipher);

byte[] buffer = new byte[1024];

int bytesRead;
while ((bytesRead = decryptionStream.read(buffer, 0, 1024)) > 0) {
    outStream.write(buffer, 0, bytesRead);
}

decryptionStream.close();
fileInputStream.close();
outStream.flush();

My issue here is with padding. Since my implementation requires me to send the filesize before the transfer, I'm running into an issue in which it will send the size of the encrypted file, which can be anywhere from 1-16 bytes larger because of AES padding.

So what happens is that the receiving-end expects to receive size-of-encrypted-file bytes, while in reality the CipherInputStream will only ever yield size-of-unencrypted-file bytes. Is there any way to know what the size of the unencrypted file will be without having to load it all into memory?

I'm not looking to change my implementation to use an AES mode which requires no padding, as I'm not looking to store an IV. Thanks in advance.

Solution

I'm sending files between two machines using the common practice of sending the dimension of the file first, in bytes, and then having the other side read the stream until it has received the exact amount of bytes

This 'common practice' cannot be used with crypto as you've found out. Padding is theoretically predictable, but it's a detail that most crypto libraries intentionally do not expose. So it's hacky to say the least and requires that you pick up some fairly deep understanding of the AES algorithm (or rather, the padding/IV algorithm you specify). Given that you're now writing code specifically geared at the exact padding algorithm you picked out, you've now also hamstrung yourself: If you ever want to modify the algorithm in use, its now more complicated than simply replacing the call that sets up the cipher stream: You also need to adjust your 'padding size calculator' code which is non-trivial.

Bad idea, in other words.

General note

Your snippet to 'copy' bytes from one stream to another is problematic. You aren't using try-with, so any crash halfway leaks resources, and you're not using either .transferTo or readAllBytes/readNBytes. You are doing buffering, and using a BufferedOutputStream - creating 2 buffers for no good reason.

Keep that in mind for the snippets in the rest of this answer as I'm fixing those oversights on the fly.

Solution 1: The bad one

You can of course just crypt the whole thing to disk or memory first, and THEN you can trivially send the size of the encrypted data first, and then the encrypted bytes. But this requires either disk space or memory space and you want to avoid that.

Solution 2: The simple one

Why are you sending the size first? If the stream just ends, as it seems to, as you close/flush, there is just no need to that. TCP/IP streams are perfectly capable of signalling that they are closed. Your server should:

Create an InputStream for the file to be sent.
Wrap that in a cipherstream
Open an OutputStream on a socket. Or web connection - they can signal 'close' out of band just as well.
Just transferTo it all, then close all streams.

And your client does the same, pretty much. Looks like this:

try (var fileOut = new FileOutputStream("receivedFile");
  var cipher = new CipherInputStream(fromSocket, decryptionCipher)) {

  cipher.transferTo(fileOut);
}

Look at that beauty. So small, so simple. It hands off all that messy business with buffers and counting to the transferTo method. You don't have to use transferto - adopting it to your obsolete code, instead of counting out how many bytes you've processed, you simply loop forever, until in.read(buffer); returns -1, then you break out: You've transfered it all. That's exactly what .transferTo does, in fact.

Solution 3: Chunking

Perhaps you have a need to keep the streams open: It's a complex protocol where you are sending many different concepts across the line; you cannot rely on close() to be a signal.

In that case it gets harder. Now you really do need to develop a little protocol of sorts. A simple one is to interleave sizes with a special sentinel value that indicates done.

To send:

// SERVER CODE:
// You have a 'socketOutputStream' from somewhere.

try (var fileIn = new FileInputStream("toSend");
  var cipherOut = new CipherOutputStream(socketOutputStream, crypto)) {

  byte[] buffer = new byte[65537];
  while (true) {
    int r = in.read(buffer, 2, 65535);
    if (r == -1) break;
    buffer[0] = (byte) (r >> 8);
    buffer[1] = (byte) r;
    cipherOut.write(buffer, 0, r + 2);
  }
  buffer[0] = 0; buffer[1] = 0;
  cipherOut.write(buffer, 0, 2);
  cipherOut.flush();
}

This code keeps sticking the size of the read data as an unsigned 2-byte value at the front, sending size = 0 at the very end which tells the client: We're done now. This system can be used to encrypt files many GBs in size without needing lots of memory or disk space.

The client does a similar task in reverse:

// CLIENT CODE:
// You have a 'socketInputStream' from somewhere.

try (var fileOut = new FileOutputStream("toReceive");
  var cipherIn = new CipherInputStream(socketInputStream, crypto)) {
  byte[] sizeBuffer = new byte[2];
  byte[] dataBuffer = new byte[65535];
  while (true) {
    cipherIn.readNBytes(sizeBuffer, 0, 2);
    int r = ((sizeBuffer[0] & 0xFF) << 8) | (sizeBuffer[1] & 0xFF);
    if (r == 0) break;
    cipherIn.readNBytes(dataBuffer, 0, r);
    fileOut.write(dataBuffer, 0, r);
  }
}

But, I repeat: This is pointlessly complicating matters if the stream is closed immediately afterwards anyway!

Solution 4: Multi-connection

This one too is only relevant if you need to do multiple tasks within a single 'session'.

A final option is to have a 'command' line and a 'data' line: You never send big data across the command line, you merely send simple text messages (encrypted or not - but because they are small, you don't need to worry about having to stream all this stuff, it can all be done in memory). One text message that the client sends to the server might be 'DOWNLOAD fileToSend'.

The server doesn't respond with the encrypted file. No, it responds with something like 'READY 91584145104981234098124', where that number is a randomly generated ID. The client is now supposed to open a second connection; if it's raw TCP/IP, to the same port, but it now sends 'FETCH 91584145104981234098124', at which point the server sends the file, encrypted, and then closes the connection (thus letting you use the simple .transferTo code). If it's the web, then your client opens https://myserver.com/fetch/91584145104981234098124 for a similar effect - though if its web, https is already encrypting things, not sure why you need to also encrypt it.