Search code examples
dictionarypointersgohashmap

Go's maps under the hood


After reading Dave Cheney's blogpost about Go's maps there is still few things unclear to me.

TLDR:

  • Why are they unordered?
  • Where actual values are stored in memory?

After digging in runtime package I found out that underlying map structure is following:

// A header for a Go map.
type hmap struct {
    // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
    // Make sure this stays in sync with the compiler's definition.
    count     int // # live cells == size of map.  Must be first (used by len() builtin)
    flags     uint8
    B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
    hash0     uint32 // hash seed

    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

    extra *mapextra // optional fields
}

buckets - is array of buckets where indexes is low-order bits of key's hash, where the bucket is:

// A bucket for a Go map.
type bmap struct {
    // tophash generally contains the top byte of the hash value
    // for each key in this bucket. If tophash[0] < minTopHash,
    // tophash[0] is a bucket evacuation state instead.
    tophash [bucketCnt]uint8
    // Followed by bucketCnt keys and then bucketCnt elems.
    // NOTE: packing all the keys together and then all the elems together makes the
    // code a bit more complicated than alternating key/elem/key/elem/... but it allows
    // us to eliminate padding which would be needed for, e.g., map[int64]int8.
    // Followed by an overflow pointer.
}

..well it's just array of uint8 where every item is first byte of key's hash. And key-value pairs are stores as key/key value/value (eight pairs per bucket). But where exactly? Considering that map may contain value of (almost) any type. There should be some kind of pointer to place in memory where array of values stored, but bmap doesn't have such info.

And since key's hashes are located in ordered array inside bucket, why it's order different every time I looping over map?


Solution

    • Why are they unordered?

    Because this gives greater freedom to the runtime to implement the map type. Although we know Go's (current) implementation is a hashmap, the language specification allows to use any map implementation like hash map, tree map etc. Also not having to remember the order, this allows the runtime to do its job more effectively and using less memory.

    Adrian's comment nicely summarizes that order is rarely needed, and it would be a waste to always maintain order. When you do need order, you may use a data structure that provides the ordering. For examples, see Map in order range loop.

    And since key's hashes are located in ordered array inside bucket, why it's order different every time I looping over map?

    The Go authors intentionally made map's iteration order randomized (so we mortals don't get dependent on a fixed order). For more, see In Golang, why are iterations over maps random?

    Also see related: Why can't Go iterate maps in insertion order?

    • Where actual values are stored in memory?

    The "where" is specified by hmap.buckets. This is a pointer value, it points to an array in memory, an array holding the buckets.

    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
    

    So hmap.buckets points to a contiguous memory segment holding buckets. A bucket is "modeled" by bmap, but this is not its actual memory layout. A bucket starts with an array holding top hash bytes of keys being in the bucket (tophash [bucketCnt]uint8), and this array is followed by bucketCnt keys of the bucket, which is then followed by bucketCnt values of the bucket. Lastly there is an overflow pointer.

    Think of the bucket like this conceptual type, which "visualizes" where keys and values are located in memory:

    type conceptualBucket struct {
        tophash     [bucketCnt]uint8
        keys        [bucketCnt]keyType
        values      [bucketCnt]valueType
        overflowPtr uintptr
    }
    

    Note: bucketCnt is a compile time constant being 8, it is the maximum number of key/elem pairs a bucket can hold.

    Of course this "picture" is inaccurate, but it gives the idea where / how keys and values are stored.