I'm new to Go and therefore may not understand some things. Now I want to rewrite my Python code to merge dbf files into a single one, but when reading large dbf files of about 1.5 GB I get the error:
runtime: out of memory: cannot allocate 1073741824-byte block (1077805056 in use) fatal error: out of memory
The error occurs in the line "result.Write(buf[:n])". I am using Windows 10, go version go1.21.6 windows/386, 16 GB RAM, intel core i5-11400
import (
"bytes"
"io"
"os"
)
func readFile(filename string) ([]byte, error) {
f, err := os.Open(filename)
if err != nil {
return nil, err
}
defer f.Close()
var result bytes.Buffer
buf := make([]byte, 4096)
for {
n, err := io.ReadAtLeast(f, buf, 1)
if err != nil {
if err == io.EOF {
break
}
return nil, err
}
result.Write(buf[:n])
}
return result.Bytes(), nil
}
I want to read my file and get a byte slice to pass it on.
Basically, what @icza said:
windows/386
is the 32-bit version of the Go compiler, that surely has limited memory management capabilities (maybe around 2-4 GB). You should definitely use the 64-bit version! Download and use theamd64
release.
I would add that your approach to read a file is a bit naive: bytes.Buffer
is just a fancy convenient wrapper around []byte
, it maintains a single contiguous slice of bytes, which means it has to reallocate its memory periodically while you're keeping Write
-ing into it–basically performing slice = append(slice, newData)
. This approach has two properties:
append
overallocates memory in an attempt to not perform true reallocation on each call to append
in a row on the same data.append
would allocate a new array of size 1 GiB plus some megabytes then copy over that one gigabyte worth of data. Then the old array becomes garbage which is not yet collected and from the point of view of the OS kernel, it appears to be a still allocated memory. So, naturally, when you're close to the full limit of how much a 32-bit program can allocate on a given 32-bit architecture, another attempt to reallocate a big enough slice might legitimately fail.One approach to work around this issue is to call Stat
on an opened file, then allocate a slice the size of the reported file size, and then use io.ReadFull
to read the whole file in one go. Or you can just use os.ReadFile
.
Another, better, approach is to just not "slurp" the whole file into memory if that is not needed: it's way better to open the source file, the destination file and then call io.Copy
on both.
From your question, it's not clear whether this approach is viable or not. If you need to
merge dbf files
(emphasis mine) you might want to keep in memory only the keys of the data records, then perform merge on these keys–which basically would result in a data structure which maps keys to the file the data for a particular key has to be taken from, and then you would iterate over the keys, reading their data and writing it to the resulting file as you go.
If, instead, you have used an unfortunate term to actually mean concatenating two data files, then two plain calls to io.Copy
would do the trick.
One last note: you might want to wrap the source file(s) with bufio.Reader
to have decent read performance; using bufio.Writer
with the resulting file could help as well. Without this, the I/O performance might surprise you.