I have a large int array that I want to persist on the filesystem. My understanding is the best way to store something like this is to use the gob package to convert it to a byte array and then to compress it with gzip. When I need it again, I reverse the process. I am pretty sure I am storing it correctly, however recovering it is failing with EOF. Long story short, I have some example code below that demonstrates the issue. (playground link here https://play.golang.org/p/v4rGGeVkLNh). I am not convinced gob is needed, however reading around it seems that its more efficient to store it as a byte array than an int array, but that may not be true. Thanks!
package main
import (
"bufio"
"bytes"
"compress/gzip"
"encoding/gob"
"fmt"
)
func main() {
arry := []int{1, 2, 3, 4, 5}
//now gob this
var indexBuffer bytes.Buffer
writer := bufio.NewWriter(&indexBuffer)
encoder := gob.NewEncoder(writer)
if err := encoder.Encode(arry); err != nil {
panic(err)
}
//now compress it
var compressionBuffer bytes.Buffer
compressor := gzip.NewWriter(&compressionBuffer)
compressor.Write(indexBuffer.Bytes())
defer compressor.Close()
//<--- I think all is good until here
//now decompress it
buf := bytes.NewBuffer(compressionBuffer.Bytes())
fmt.Println("byte array before unzipping: ", buf.Bytes())
if reader, err := gzip.NewReader(buf); err != nil {
fmt.Println("gzip failed ", err)
panic(err)
} else {
//now ungob it...
var intArray []int
decoder := gob.NewDecoder(reader)
defer reader.Close()
if err := decoder.Decode(&intArray); err != nil {
fmt.Println("gob failed ", err)
panic(err)
}
fmt.Println("final int Array content: ", intArray)
}
}
You are using bufio.Writer
which–as its name implies–buffers bytes written to it. This means if you're using it, you have to flush it to make sure buffered data makes its way to the underlying writer:
writer := bufio.NewWriter(&indexBuffer)
encoder := gob.NewEncoder(writer)
if err := encoder.Encode(arry); err != nil {
panic(err)
}
if err := writer.Flush(); err != nil {
panic(err)
}
Although the use of bufio.Writer
is completely unnecessary as you're already writing to an in-memory buffer (bytes.Buffer
), so just skip that, and write directly to bytes.Buffer
(and so you don't even have to flush):
var indexBuffer bytes.Buffer
encoder := gob.NewEncoder(&indexBuffer)
if err := encoder.Encode(arry); err != nil {
panic(err)
}
The next error is how you close the gzip stream:
defer compressor.Close()
This deferred closing will only happen when the enclosing function (the main()
function) returns, not a second earlier. But by that time you already wanted to read the zipped data, but that might still sit in an internal cache of gzip.Writer
, and not in compressionBuffer
, so you obviously can't read the compressed data from compressionBuffer
. Close the gzip stream without using defer
:
if err := compressor.Close(); err != nil {
panic(err)
}
With these changes, you program runs and outputs (try it on the Go Playground):
byte array before unzipping: [31 139 8 0 0 0 0 0 0 255 226 249 223 200 196 200 244 191 137 129 145 133 129 129 243 127 19 3 43 19 11 27 7 23 32 0 0 255 255 110 125 126 12 23 0 0 0]
final int Array content: [1 2 3 4 5]
As a side note: buf := bytes.NewBuffer(compressionBuffer.Bytes())
– this buf
is also completely unnecessary, you can just start decoding compressionBuffer
itself, you can read data from it that was previously written to it.
As you might have noticed, the compressed data is much larger than the initial, compressed data. There are several reasons: both encoding/gob
and compress/gzip
streams have significant overhead, and they (may) only make input smaller on a larger scale (5 int numbers don't qualify to this).
Please check related question: Efficient Go serialization of struct to disk
For small arrays, you may also consider variable-length encoding, see binary.PutVarint()
.