Why does Go use less memory for a slice of length 100k, than for an array of length 100k?

Consider the following code where I allocate 4 thousand arrays, each of length 100k:

    parentMap := make(map[int][100_000]int)
    for i := 0; i < 4000; i++ {
        parentMap[i] = [100_000]int{}
        time.Sleep(3 * time.Millisecond)
    }

If I run that locally, and analyze its memory usage, it starts to use >2GB of memory.

Now, if we changed the code slightly to use a slices of arrays (but also of length 100k) like so:

    parentMap := make(map[int][]int)
    for i := 0; i < 4000; i++ {
        parentMap[i] = make([]int, 100_000)
        time.Sleep(3 * time.Millisecond)
    }

On my machine, memory peaks at about 73MB. Why is this?

I thought that both of these snippets would use roughly equal memory with the following reasoning:

In both cases, the Go runtime will allocate the values of parentMap on the heap. Go does this because if it allocated these values on the stack, then the values of parentMap would all clear as soon as the current function went out of scope.
Therefore the first snippet allocates 4k arrays directly on the heap.
And, the second snippet allocates 4k slice headers on the heap. Each slice header has a pointer to a unique array (also on the heap) of size 100k.
In both cases, there are 4k arrays on the heap of size 100k. And therefore, roughly equal amount of memory should be used in either case.

I read: https://go.dev/blog/slices-intro. But couldn't find an implementation detail that explains this.

Solution

The version with the slices is probably benefiting from lazy allocation. Nothing ever tries to write to the data buffer of one of those slices, so the OS is free to not actually allocate the memory for those buffers until something actually tries to write. (The OS can zero-initialize the buffers lazily too, so that doesn't force an allocation.)

Meanwhile, the version with the arrays needs to actually copy the arrays into the map, and that means actually performing writes. Even though the values written are all zeros, they're still writes, so the OS has to actually allocate memory for the data to be written to.

Try writing data to those slices, and the slice version should take gigabytes of memory too. (I think one value per page of memory should be enough, but it may be easier to just fill the slices with 1s.)