Search code examples
javaclojurecompressiongzipdeflate

Decompress zlib stream in Clojure


I have a binary file with contents created by zlib.compress on Python, is there an easy way to open and decompress it in Clojure?

import zlib
import json

with open('data.json.zlib', 'wb') as f:
    f.write(zlib.compress(json.dumps(data).encode('utf-8')))

Basicallly it isn't a gzip file, it is just bytes representing deflated data.

I could only find these references but not quite what I'm looking for (I think first two are most relevant):

Must I really implement this multi-line wrapper to java.util.zip or is there a nice library out there? Actually I'm not even sure if these byte streams are compatible across libraries, or if I'm just trying to mix-and-match wrong libs.

Steps in Python:

>>> '{"hello": "world"}'.encode('utf-8')
b'{"hello": "world"}'
>>> zlib.compress(b'{"hello": "world"}')
b'x\x9c\xabV\xcaH\xcd\xc9\xc9W\xb2RP*\xcf/\xcaIQ\xaa\x05\x009\x99\x06\x17'
>>> [int(i) for i in zlib.compress(b'{"hello": "world"}')]
[120, 156, 171, 86, 202, 72, 205, 201, 201, 87, 178, 82, 80, 42, 207, 47, 202, 73, 81, 170, 5, 0, 57, 153, 6, 23]
>>> import numpy
>>> [numpy.int8(i) for i in zlib.compress(b'{"hello": "world"}')]
[120, -100, -85, 86, -54, 72, -51, -55, -55, 87, -78, 82, 80, 42, -49, 47, -54, 73, 81, -86, 5, 0, 57, -103, 6, 23]
>>> zlib.decompress(bytes([120, 156, 171, 86, 202, 72, 205, 201, 201, 87, 178, 82, 80, 42, 207, 47, 202, 73, 81, 170, 5, 0, 57, 153, 6, 23])).decode('utf-8')
'{"hello": "world"}'

Decode attempt in Clojure:

; https://github.com/funcool/buddy-core/blob/master/src/buddy/util/deflate.clj#L40 without try-catch
(ns so.core
  (:import java.io.ByteArrayInputStream
           java.io.ByteArrayOutputStream
           java.util.zip.Deflater
           java.util.zip.DeflaterOutputStream
           java.util.zip.InflaterInputStream
           java.util.zip.Inflater
           java.util.zip.ZipException)
  (:gen-class))

(defn uncompress
  "Given a compressed data as byte-array, uncompress it and return as an other byte array."
  ([^bytes input] (uncompress input nil))
  ([^bytes input {:keys [nowrap buffer-size]
                  :or {nowrap true buffer-size 2048}
                  :as opts}]
   (let [buf  (byte-array (int buffer-size))
         os   (ByteArrayOutputStream.)
         inf  (Inflater. ^Boolean nowrap)]
     (with-open [is  (ByteArrayInputStream. input)
                 iis (InflaterInputStream. is inf)]
       (loop []
         (let [readed (.read iis buf)]
           (when (pos? readed)
             (.write os buf 0 readed)
             (recur)))))
     (.toByteArray os))))

(uncompress (byte-array [120, -100, -85, 86, -54, 72, -51, -55, -55, 87, -78, 82, 80, 42, -49, 47, -54, 73, 81, -86, 5, 0, 57, -103, 6, 23]))
ZipException invalid stored block lengths  java.util.zip.InflaterInputStream.read (InflaterInputStream.java:164)

Any help would be appreciated. I wouldn't want to use zip or gzip files as I only care about raw content, not file names or modification dates in this context. But is possible to use an other compression algorithm on Python side if it is the only option.


Solution

  • Here is an easy way to do it with gzip:

    Python code:

    import gzip
    content = "the quick brown fox"
    with gzip.open('fox.txt.gz', 'wb') as f:
        f.write(content)
    

    Clojure code:

    (with-open [in (java.util.zip.GZIPInputStream.
                    (clojure.java.io/input-stream
                     "fox.txt.gz"))]
      (println "result:" (slurp in)))
    
    ;=>  result: the quick brown fox
    

    Keep in mind that "gzip" is an algorithm and a format, and does not mean you need to use the "gzip" command-line tool.

    Please note that the input to Clojure doesn't have to be a file. You could send the gzip compressed data as raw bytes over a socket and still decompress it on the Clojure side. Full details at: https://clojuredocs.org/clojure.java.io/input-stream

    Update

    If you need to use the pure zlib format instead of gzip, the result is very similar:

    Python code:

    import zlib
    fp = open( 'balloon.txt.z', 'wb' )
    fp.write( zlib.compress( 'the big red baloon' ))
    fp.close()
    

    Clojure code:

    (with-open [in (java.util.zip.InflaterInputStream.
                    (clojure.java.io/input-stream
                     "balloon.txt.z"))]
      (println "result:" (slurp in)))
    
    ;=> result: the big red baloon