Search code examples
goout-of-memory

Can't allocate memory to read a file in Go


I'm new to Go and therefore may not understand some things. Now I want to rewrite my Python code to merge dbf files into a single one, but when reading large dbf files of about 1.5 GB I get the error:

runtime: out of memory: cannot allocate 1073741824-byte block (1077805056 in use) fatal error: out of memory

The error occurs in the line "result.Write(buf[:n])". I am using Windows 10, go version go1.21.6 windows/386, 16 GB RAM, intel core i5-11400

import (
    "bytes"
    "io"
    "os"
)

func readFile(filename string) ([]byte, error) {
    f, err := os.Open(filename)
    if err != nil {
        return nil, err
    }
    defer f.Close()
    var result bytes.Buffer
    buf := make([]byte, 4096)
    for {
        n, err := io.ReadAtLeast(f, buf, 1)
        if err != nil {
            if err == io.EOF {
                break
            }
            return nil, err
        }
        result.Write(buf[:n])
    }
    return result.Bytes(), nil
}

I want to read my file and get a byte slice to pass it on.


Solution

  • Basically, what @icza said:

    windows/386 is the 32-bit version of the Go compiler, that surely has limited memory management capabilities (maybe around 2-4 GB). You should definitely use the 64-bit version! Download and use the amd64 release.

    I would add that your approach to read a file is a bit naive: bytes.Buffer is just a fancy convenient wrapper around []byte, it maintains a single contiguous slice of bytes, which means it has to reallocate its memory periodically while you're keeping Write-ing into it–basically performing slice = append(slice, newData). This approach has two properties:

    • append overallocates memory in an attempt to not perform true reallocation on each call to append in a row on the same data.
    • Reallocation means allocating a new chunk of memory which is larger than the existing one then copying over the memory form the existing chunk. This means if you have a slice of 1 GiB and try to append a 4 KiB chunk to it, append would allocate a new array of size 1 GiB plus some megabytes then copy over that one gigabyte worth of data. Then the old array becomes garbage which is not yet collected and from the point of view of the OS kernel, it appears to be a still allocated memory. So, naturally, when you're close to the full limit of how much a 32-bit program can allocate on a given 32-bit architecture, another attempt to reallocate a big enough slice might legitimately fail.

    One approach to work around this issue is to call Stat on an opened file, then allocate a slice the size of the reported file size, and then use io.ReadFull to read the whole file in one go. Or you can just use os.ReadFile.

    Another, better, approach is to just not "slurp" the whole file into memory if that is not needed: it's way better to open the source file, the destination file and then call io.Copy on both.

    From your question, it's not clear whether this approach is viable or not. If you need to

    merge dbf files

    (emphasis mine) you might want to keep in memory only the keys of the data records, then perform merge on these keys–which basically would result in a data structure which maps keys to the file the data for a particular key has to be taken from, and then you would iterate over the keys, reading their data and writing it to the resulting file as you go.

    If, instead, you have used an unfortunate term to actually mean concatenating two data files, then two plain calls to io.Copy would do the trick.

    One last note: you might want to wrap the source file(s) with bufio.Reader to have decent read performance; using bufio.Writer with the resulting file could help as well. Without this, the I/O performance might surprise you.