Search code examples
pythonpython-3.xalgorithmbinary-searchtreemap

Do time complexities differ between list and deque for Bisect/Insort?


As far as I know, list in Python is implemented using array, while deque is implemented using double linked list. In either case, the binary search of a certain value takes O(logn) time, but if we insert to that position, array takes O(n) while double linked list takes O(1).

So, can we use the combination of bisect, insort, and deque to implement all Dynamic Set Operations with time complexity comparable to TreeMap in Java?


Update: I tested it in this Leetcode question: https://leetcode.com/problems/time-based-key-value-store/submissions/

Quite the contrary to my expectation, when I switch from list to deque, the speed slowed down a lot.


Solution

  • To your title question: Yes, they do.

    To your hypothetical sorted set implementation question: No, you can't.

    One, you're mistaken on the implementation of deque; it's not a plain "item per node" linked list, it's a block of items per node (64 on the CPython reference interpreter, though that's an implementation detail). And aside from the head and tail blocks, internal blocks are never left empty, so insertion midway through the deque isn't particularly cheap, it still has to move a bunch of stuff around. It's not O(n) like a mid-list insertion, as it takes advantage of some efficiencies in rotation to rotate, append to one side or the other, then rotate back, but it's a far cry from insertion at a known point in a linked list, remaining O(n) (though with large constant divisors thanks to shuffling whole blocks being cheaper than moving each of the individual items).

    Two, each lookup in a deque is O(n), not O(1) like a list; it has a constant divisor of 64 as stated previously, and it's drops to O(1) near either end of the deque, but it's still O(n) in general, which scales poorly for large deques. bisect searches are O(log n) under the assumption that indexing the sequence is O(1); for a deque, they'd be O(n log n), as they'd perform log n O(n) indexing operations. This matches your results from testing; bisect+deque is significantly worse.

    Java's TreeMap isn't implemented in terms of binary search and a linked list in any event; linked lists are no good for this, since ultimately a full binary search must traverse back and forth enough that it does O(n) total work, even if it only has to compare against O(log n) elements. A tree map needs a tree structure of some sort, you can't just fake it with a linked list and a good algorithm.

    Built-in alternatives include:

    1. insort of a normal list: Sure it's O(n) overall, but the expensive part (finding where to insert) is O(log n), and it's only the "make room" step that's O(n), and it's a really cheap O(n) (basically a memcpy). Not acceptable for truly huge lists, but you'd be surprised how large a list you'd need before the overhead was noticeable against Python's slowness.
    2. Delayed, buffered sorting: If lookups are infrequent, but insertions are common, defer the sort until needed to minimize the number of sorting operations; just append the new elements to the end without sorting and set a "needs sorting" flag, and re-sort before a lookup when the flag is set. The TimSort algorithm does very well when the input is already mostly sorted (much closer to O(n) than a general purpose sort without optimizations for partially sorted typically can do), so it may be fine.
    3. If you only need the smallest element at any given time, the heapq module can do that with true O(log n) insertions and removals, and gets the minimum with O(1) (it's always index 0).
    4. Use a sqlite3 database (possible via shelve), indexed as appropriate; sqlite3 indices default to using a B-tree, meaning queries ordered using the index key(s) get the results back in sorted order "for free".

    Otherwise, you'll have to install a third-party module that provides a proper sorted set-like type.