Search code examples
gozlib

How do I decode zlib stream in Go?


What is the issue?

I cannot decode valid compressed chunks from zlib stream using go's zlib package.

I have prepared a github repo which contains code and data illustrating the issue I have: https://github.com/andreyst/zlib-issue.

What are those chunks?

They are messages generated by a text game server (MUD). This game server send compressed stream of messages in multiple chunks, first of which contains zlib header and others do not.

I have captured two chunks (first and second) with a proxy called "mcclient", which is a sidecar to provide compression for MUD clients that do not support compression. It is written in C and uses C zlib library to decode compressed chunks.

Chunks are contained in "chunks" directory and are numerated 0 and 1. *.in files contain compressed data. *.out contain uncompressed data captured from mcclient. *.log contain status of zlib decompression (return code of inflate call).

A special all.in chunk is chunk 0 concatenated with chunk 1.

Why do I think they are valid?

  1. mcclient successfully decompresses input chunks with C's zlib without any issues. *.log status shows 0 which means Z_OK which means no errors in zlib parlance.
  2. zlib-flate -uncompress < chunks/all.in works without any errors under Linux and decompresses to same content. Under Mac OS it also decompresses to same content, but with warning zlib-flate: WARNING: zlib code -5, msg = input stream is complete but output may still be valid — which look as expected because chunks do not contain "official" stream end.
  3. Python code in decompress.py correctly decompresses with both all.in and 0/1 chunks without any issues.

What is the issue with go's zlib?

See main.go — it tries to decompress those chunks, starting with all.in and then trying to decompress chunks 0 and 1 step by step.

An attempt to decode all.in (func all()) somewhat succeeds, at least decompressed data is the same, but zlib reader returns error flate: corrupt input before offset 446.

When trying real-life scenario of decompressing chunk by chunk (func stream()), zlib reader decodes first chunk with expected data, but returning an error flate: corrupt input before offset 32, and subsequent attempt to decode chunk 1 fails completely.

The question

Is it possible to use go's zlib package in some kind of "streaming" mode which is suited for scenario like this? Maybe I am using it incorrectly?

If not, what is the workaround? Also it would be interesting to know, why is that so — is it by design? Is it just not implemented yet? What am I missing?


Solution

  • Notice that error is saying that the data at an offset after your input is corrupt. That is because of the way your are reading from the files:

        buf := make([]byte, 100000)
        n, readErr := f.Read(buf)
        if readErr != nil {
            log.Fatalf("readErr=%v\n", readErr)
        }
        fmt.Printf("Read bytes, n=%v\n", n)
    
        buffer := bytes.NewBuffer(buf)
        zlibReader, zlibErr := zlib.NewReader(buffer)
        if zlibErr != nil {
            log.Fatalf("zlibErr=%v\n", zlibErr)
        }
    

    buf := make([]byte, 100000) will make a slice of 100000 bytes, all of them 0. But you are only reading 443 bytes in the case of all.in. Since you never shorten the slice, the reader will encounter a few thousand zeros after the valid data and conclude it is corrupt. That is why you get output and an error.

    As for streaming. In the case of a TCP/UDP connection you should be able to just pass the connection which is a io.Reader to the zlib.NewReader. To simulate the same I used an io.Pipe in the modified code:

    package main
    
    import (
        "bytes"
        "compress/zlib"
        "fmt"
        "io"
        "log"
        "os"
    
        otherzlib "github.com/4kills/go-zlib"
    )
    
    func main() {
        all()
        stream()
    
        // Alas it hangs :(
        // otherZlib()
    }
    
    func all() {
        fmt.Println("==== RUNNING DECOMPRESSION OF all.in")
        fmt.Println("")
    
        buf, readErr := os.ReadFile("./chunks/all.in")
        if readErr != nil {
            log.Fatalf("readErr=%v\n", readErr)
        }
        fmt.Printf("Read bytes, n=%v\n", len(buf))
    
        buffer := bytes.NewBuffer(buf)
        zlibReader, zlibErr := zlib.NewReader(buffer)
        if zlibErr != nil {
            log.Fatalf("zlibErr=%v\n", zlibErr)
        }
    
        out := new(bytes.Buffer)
        written, copyErr := io.Copy(out, zlibReader)
        if copyErr != nil {
            log.Printf("copyErr=%v\n", copyErr)
        }
        fmt.Printf("Written bytes, n=%v, out:\n%v\n", written, out.String())
        fmt.Println("")
    }
    
    func stream() {
        fmt.Println("==== RUNNING DECOMPRESSION OF SEPARATE CHUNKS")
        fmt.Println("")
    
        pRead, pWrite := io.Pipe()
        go func() {
            buf, readErr := os.ReadFile("./chunks/0.in")
            if readErr != nil {
                log.Fatalf("readErr=%v\n", readErr)
            }
            fmt.Printf("Read 0 bytes, n=%v\n", len(buf))
    
            written0, copy0Err := io.Copy(pWrite, bytes.NewBuffer(buf))
            if copy0Err != nil {
                log.Printf("copy0Err=%v\n", copy0Err)
            }
            fmt.Printf("Written compressed bytes, n0=%v", written0)
    
            buf, readErr = os.ReadFile("./chunks/1.in")
            if readErr != nil {
                log.Fatalf("read1Err=%v\n", readErr)
            }
            fmt.Printf("Read 1 bytes, n=%v\n", len(buf))
    
            written1, copy1Err := io.Copy(pWrite, bytes.NewBuffer(buf))
            if copy1Err != nil {
                log.Printf("copy1Err=%v\n", copy1Err)
            }
            fmt.Printf("Written compressed bytes, n1=%v", written1)
    
            pWrite.Close()
        }()
    
        zlibReader, zlibErr := zlib.NewReader(pRead)
        if zlibErr != nil {
            log.Fatalf("zlibErr=%v\n", zlibErr)
        }
    
        out := new(bytes.Buffer)
        written2, copy2Err := io.Copy(out, zlibReader)
        if copy2Err != nil {
            log.Printf("copy2Err=%v\n", copy2Err)
        }
        fmt.Printf("Written decompressed bytes, n0=%v, out:\n%v\n", written2, out.String())
    
        fmt.Println("")
    }
    

    With this code I get no errors from stream() but I still get a copyErr=unexpected EOF error from all(), looks like the all.in is missing checksum data at the end, but I figure that is just an accident.