I need help in understanding an issue that I am seeing in using Inflater and Deflator classes in Java. My requirement is very simple. I want to compress and decompress Java strings.
The issue is that if my string is less than 54 characters then the decompression does not returns all the characters of the strings. My compression and decompression code are as follows:
public String compress(String payload) {
Deflater deflater = new Deflater();
deflater.setInput(payload.getBytes(StandardCharsets.ISO_8859_1));
deflater.finish();
byte[] output = new byte[payload.length()];
int size = deflater.deflate(output);
byte[] payloadArray = Arrays.copyOf(output, size);// I do this to make sure only the compressed data is returned
deflater.end();
return new String(payloadArray, StandardCharsets.ISO_8859_1);
}
public String decompress(String compressedPayload, int originalPayloadSize) {
Inflater inflater = new Inflater();
inflater.setInput(compressedPayload.getBytes(StandardCharsets.ISO_8859_1));
byte[] output = new byte[originalPayloadSize];
int orgSize = inflater.inflate(output);
inflater.end();
return new String(output, StandardCharsets.ISO_8859_1);
}
My test case is as follows:
@Test
void verify() {
final String payload = "1 2 3 4 5 6 7 8 9 one two apple orange banana leaves ";// This fails!!
CompressionDeCompression compressionDecompression = new CompressionDeCompression();
String compressedPayload = compressionDecompression.compress(payload);
Assertions.assertNotNull(compressedPayload);
String decompressedPayload = compressionDecompression.decompress(compressedPayload, payload.length());
Assertions.assertEquals(payload.length(), decompressedPayload.length());
Assertions.assertEquals(payload, decompressedPayload);
}
The above test case fails with the following exception:
org.opentest4j.AssertionFailedError:
Expected :1 2 3 4 5 6 7 8 9 one two apple orange banana leaves
Actual :1 2 3 4 5 6 7 8 9 one two apple orange banana leaves
But if I simply add one more character in the payload then it works. For example;
final String payload = "1 2 3 4 5 6 7 8 9 one two apple orange banana leaves t";
In short this does work: final String payload = "1 2 3 4 5 6 7 8 9 one two apple orange banana leaves t";
This doesn't work: final String payload = "1 2 3 4 5 6 7 8 9 one two apple orange banana leaves ";
Can someone please help me understand this issue?
As far as I can suggest, your issue is in this line:
byte[] output = new byte[payload.length()];
Deflater class in Java compresses the data into a format that doesn't necessarily result in a smaller size. This is especially true when the data is small or not easily compressible, since it uses the ZLIB compression library, which adds metadata and a checksum to the output (!). For small inputs, the size of this additional data can cause the compressed output to be larger than the input
To prove this option, you may need to call
deflater.finished() // false in your case
right after the deflate call, which returns true when all input data has been processed and all compressed data has been written to the output, and false - meaning that the output array is not large enough to hold all the compressed data.
I can make an assumption, that it makes sense to have an output array size of at least 2x of input string size in your case (or 100, as minimum), so it should work:
public String compress(String payload) {
Deflater deflater = new Deflater();
deflater.setInput(payload.getBytes(StandardCharsets.ISO_8859_1));
deflater.finish();
byte[] output = new byte[payload.length() * 2];
int size = deflater.deflate(output);
byte[] payloadArray = Arrays.copyOf(output, size);
deflater.end();
return new String(payloadArray, StandardCharsets.ISO_8859_1);
}
UPD:
For a typical deflate stream, overhead is defined as 2 bytes at the start for the zlib header, and 4 bytes at the end for the Adler-32 checksum, for a total of 6 bytes of metadata; so +6 for the length even for single byte string will be always enough