Search code examples
arraysswiftperformancedictionarynsdictionary

Swift enormous dictionary of arrays, very slow


I am working on a project in Swift, using a dictionary.

This dictionary is of the type [String : [Posting]]. I have around 200k different "terms" (keys) to insert in it, and for each term I have around 500 to 1000 objects to append in a list. I know it's a weird practice but I don't have the choice and I must deal with all those elements.

The issue is that this is very very slow, as the dictionary gets bigger. I tried switching to a NSMutableDictionary, no luck.

My addTerm function is called everytime I need to insert an element :

   func addTerm(_ term: String, withId id: Int, atPosition position: Int) {

        if self.map[term] == nil {
            self.map[term] = [Posting]()
        }

        if self.map[term]!.last?.documentId == id {
            self.map[term]!.last?.addPosition(position)
        }
        else {
            self.map[term]!.append(Posting(withId: id, atPosition: position, forTerm: term))
        }
    }

EDIT: I realized that its not the dictionary that causes all this lag, but its actually the arrays it contains. Arrays re-allocate way too much when adding new elements, and the best I could was to replace them with ContiguousArray.


Solution

  • This is fairly common performance trap, as also observed in:

    The issue stems from the fact that the array you're mutating in the expression self.map[term]!.append(...) is a temporary mutable copy of the underlying array in the dictionary's storage. This means that the array is never uniquely referenced and so always has its buffer re-allocated.

    This situation will fixed in Swift 5 with the unofficial introduction of generalised accessors, but until then, one solution (as mentioned in both the above Q&As) is to use Dictionary's subscript(_:default:) which from Swift 4.1 can mutate the value directly in storage.

    Although your case isn't quite a straightforward case of applying a single mutation, so you need some kind of wrapper function in order to allow you to have scoped access to your mutable array.

    For example, this could look like:

    class X {
    
      private var map: [String: [Posting]] = [:]
    
      private func withPostings<R>(
        forTerm term: String, mutations: (inout [Posting]) throws -> R
      ) rethrows -> R {
        return try mutations(&map[term, default: []])
      }
    
      func addTerm(_ term: String, withId id: Int, atPosition position: Int) {
    
        withPostings(forTerm: term) { postings in
          if let posting = postings.last, posting.documentId == id {
            posting.addPosition(position)
          } else {
            postings.append(Posting(withId: id, atPosition: position, forTerm: term))
          }
        }
    
      }
      // ...
    }