Search code examples
xcodeswiftdictionarymemorytrie

Swift Dictionary Memory Consumption is Astronomical


Can anyone help shed some light on why the below code consumes well over 100 MB of RAM during runtime?

public struct Trie<Element : Hashable> {
    private var children: [Element:Trie<Element>]
    private var endHere : Bool

    public init() {
        children = [:]
        endHere  = false
    }
    public init<S : SequenceType where S.Generator.Element == Element>(_ seq: S) {
        self.init(gen: seq.generate())
    }
    private init<G : GeneratorType where G.Element == Element>(var gen: G) {
        if let head = gen.next() {
            (children, endHere) = ([head:Trie(gen:gen)], false)
        } else {
            (children, endHere) = ([:], true)
        }
    }
    private mutating func insert<G : GeneratorType where G.Element == Element>(var gen: G) {
        if let head = gen.next() {
            let _ = children[head]?.insert(gen) ?? { children[head] = Trie(gen: gen) }()
        } else {
            endHere = true
        }
    }
    public mutating func insert<S : SequenceType where S.Generator.Element == Element>(seq: S) {
        insert(seq.generate())
    }
}

var trie = Trie<UInt32>()
for i in 0..<300000 {
    trie.insert([UInt32(i), UInt32(i+1), UInt32(i+2)])
}

Based on my calculations total memory consumption for the above data structure should be somewhere around the following...

3 * count * sizeof(Trie<UInt32>)

Or –

3 * 300,000 * 9 = 8,100,000 bytes = ~8 MB

How is it that this data structure consumes well over 100 MB during runtime?


Solution

  • sizeof reports only the static footprint on the stack, which the Dictionary is just kind of a wrapper of the reference to its internal reference type implementation, and also the copy on write support. In other words, the key-value pairs and the hash table of your dictionary are allocated on the heap, which is not covered by sizeof. This applies to all other Swift collection types.

    In your case, you are creating three Trie - and indirectly three dictionaries - every iteration of the 300000. I wouldn't be surprised if the 96-byte allocations mentioned by @Macmade is the minimum overhead of a dictionary (e.g. its hash bucket).

    There might also be cost related to growing storage. So you may try to see if setting a minimumCapacity on the dictionary would help. On the other hand, if you do not need a divergent path generated per iteration, you may consider an indirect enum as an alternative, e.g.

    public enum Trie<Element> {
        indirect case Next(Element, Trie<Element>)
        case End
    }
    

    which should use less memory.