I cannot decode valid compressed chunks from zlib stream using go's zlib
package.
I have prepared a github repo which contains code and data illustrating the issue I have: https://github.com/andreyst/zlib-issue.
They are messages generated by a text game server (MUD). This game server send compressed stream of messages in multiple chunks, first of which contains zlib header and others do not.
I have captured two chunks (first and second) with a proxy called "mcclient", which is a sidecar to provide compression for MUD clients that do not support compression. It is written in C and uses C zlib
library to decode compressed chunks.
Chunks are contained in "chunks" directory and are numerated 0
and 1
. *.in
files contain compressed data. *.out
contain uncompressed data captured from mcclient. *.log
contain status of zlib decompression (return code of inflate
call).
A special all.in
chunk is chunk 0
concatenated with chunk 1
.
mcclient
successfully decompresses input chunks with C's zlib
without any issues. *.log
status shows 0
which means Z_OK which means no errors in zlib parlance.zlib-flate -uncompress < chunks/all.in
works without any errors under Linux and decompresses to same content. Under Mac OS it also decompresses to same content, but with warning zlib-flate: WARNING: zlib code -5, msg = input stream is complete but output may still be valid
— which look as expected because chunks do not contain "official" stream end.decompress.py
correctly decompresses with both all.in
and 0
/1
chunks without any issues.See main.go
— it tries to decompress those chunks, starting with all.in
and then trying to decompress chunks 0
and 1
step by step.
An attempt to decode all.in
(func all()
) somewhat succeeds, at least decompressed data is the same, but zlib reader returns error flate: corrupt input before offset 446
.
When trying real-life scenario of decompressing chunk by chunk (func stream()
), zlib reader decodes first chunk with expected data, but returning an error flate: corrupt input before offset 32
, and subsequent attempt to decode chunk 1
fails completely.
Is it possible to use go's zlib
package in some kind of "streaming" mode which is suited for scenario like this? Maybe I am using it incorrectly?
If not, what is the workaround? Also it would be interesting to know, why is that so — is it by design? Is it just not implemented yet? What am I missing?
Notice that error is saying that the data at an offset after your input is corrupt. That is because of the way your are reading from the files:
buf := make([]byte, 100000)
n, readErr := f.Read(buf)
if readErr != nil {
log.Fatalf("readErr=%v\n", readErr)
}
fmt.Printf("Read bytes, n=%v\n", n)
buffer := bytes.NewBuffer(buf)
zlibReader, zlibErr := zlib.NewReader(buffer)
if zlibErr != nil {
log.Fatalf("zlibErr=%v\n", zlibErr)
}
buf := make([]byte, 100000)
will make a slice of 100000 bytes, all of them 0. But you are only reading 443 bytes in the case of all.in
. Since you never shorten the slice, the reader will encounter a few thousand zeros after the valid data and conclude it is corrupt. That is why you get output and an error.
As for streaming. In the case of a TCP/UDP connection you should be able to just pass the connection which is a io.Reader
to the zlib.NewReader
. To simulate the same I used an io.Pipe in the modified code:
package main
import (
"bytes"
"compress/zlib"
"fmt"
"io"
"log"
"os"
otherzlib "github.com/4kills/go-zlib"
)
func main() {
all()
stream()
// Alas it hangs :(
// otherZlib()
}
func all() {
fmt.Println("==== RUNNING DECOMPRESSION OF all.in")
fmt.Println("")
buf, readErr := os.ReadFile("./chunks/all.in")
if readErr != nil {
log.Fatalf("readErr=%v\n", readErr)
}
fmt.Printf("Read bytes, n=%v\n", len(buf))
buffer := bytes.NewBuffer(buf)
zlibReader, zlibErr := zlib.NewReader(buffer)
if zlibErr != nil {
log.Fatalf("zlibErr=%v\n", zlibErr)
}
out := new(bytes.Buffer)
written, copyErr := io.Copy(out, zlibReader)
if copyErr != nil {
log.Printf("copyErr=%v\n", copyErr)
}
fmt.Printf("Written bytes, n=%v, out:\n%v\n", written, out.String())
fmt.Println("")
}
func stream() {
fmt.Println("==== RUNNING DECOMPRESSION OF SEPARATE CHUNKS")
fmt.Println("")
pRead, pWrite := io.Pipe()
go func() {
buf, readErr := os.ReadFile("./chunks/0.in")
if readErr != nil {
log.Fatalf("readErr=%v\n", readErr)
}
fmt.Printf("Read 0 bytes, n=%v\n", len(buf))
written0, copy0Err := io.Copy(pWrite, bytes.NewBuffer(buf))
if copy0Err != nil {
log.Printf("copy0Err=%v\n", copy0Err)
}
fmt.Printf("Written compressed bytes, n0=%v", written0)
buf, readErr = os.ReadFile("./chunks/1.in")
if readErr != nil {
log.Fatalf("read1Err=%v\n", readErr)
}
fmt.Printf("Read 1 bytes, n=%v\n", len(buf))
written1, copy1Err := io.Copy(pWrite, bytes.NewBuffer(buf))
if copy1Err != nil {
log.Printf("copy1Err=%v\n", copy1Err)
}
fmt.Printf("Written compressed bytes, n1=%v", written1)
pWrite.Close()
}()
zlibReader, zlibErr := zlib.NewReader(pRead)
if zlibErr != nil {
log.Fatalf("zlibErr=%v\n", zlibErr)
}
out := new(bytes.Buffer)
written2, copy2Err := io.Copy(out, zlibReader)
if copy2Err != nil {
log.Printf("copy2Err=%v\n", copy2Err)
}
fmt.Printf("Written decompressed bytes, n0=%v, out:\n%v\n", written2, out.String())
fmt.Println("")
}
With this code I get no errors from stream()
but I still get a copyErr=unexpected EOF
error from all()
, looks like the all.in
is missing checksum data at the end, but I figure that is just an accident.