Search code examples
javascalagzipjava-17

Gzip compression of a string gives different results in java11 vs java17


I applied gzip compression to the string test-string. When I use Scala 2.13.8 with Java 11.0.13 (Java HotSpot(TM) 64-Bit Server VM), it results in the compressed string H4sIAAAAAAAAACtJLS7RLS4pysxLBwCFdJByCwAAAA==.

However, when I perform the same compression operation with Scala 2.13.8 on Java 17.0.4.1 (OpenJDK 64-Bit Server VM), it yields H4sIAAAAAAAA/ytJLS7RLS4pysxLBwCFdJByCwAAAA==. however, both of these compressed strings correctly decompressed to retrieve the original string test-string.

I assume this can depend on several factors like Default Compression Levels: the default compression level might differ between Java 11 and Java 17, resulting in different output for the same input. Algorithm Improvements: The Gzip implementation in Java 17 may have been optimized, leading to different compression results.Internal Implementation Details: The internal implementation details of Gzip compression may have changed between Java 11 and Java 17, affecting the compressed output.

What could be the reason behind this? I am attaching the code below.

val bos = new ByteArrayOutputStream("test-string".length)
val b64os = new Base64OutputStream(bos)
val gzip = new GZIPOutputStream(b64os)
gzip.write("test-string".getBytes("UTF-8"))
gzip.close()
val compressed = new String(bos.toByteArray, "UTF-8")
bos.close()
compressed.trim

Solution

  • If we look at both your outputs in hex, we have these two pieces of data

    1f8b08000000000000ff2b492d2ed12d2e29cacc4b0700857490720b000000
                      ^^
    1f8b08000000000000002b492d2ed12d2e29cacc4b0700857490720b000000
                      ^^
    

    Basically 1 byte changed from 0 to 255 (ff in hex). That's the OS header of the gzip format, which was changed from 0 to 255 in java 16 according to this :https://bugs.openjdk.org/browse/JDK-8244706