Search code examples
pythondata-structurespython-3.xdeque

How can I implement a data structure that preserves order and has fast insertion/removal?


I'm looking for a data structure that preserves the order of its elements (which may change over the life of the data structure, as the client may move elements around).

It should allow fast search, insertion before/after a given element, removal of a given element, lookup of the first and last elements, and bidirectional iteration starting at a given element.

What would be a good implementation?

Here's my first attempt:

A class deriving from both collections.abc.Iterable and collections.abc.MutableSet that contains a linked list and a dictionary. The dictionary's keys are elements, values are nodes in the linked list. The dictionary would handle search for a node given an element. Once an element is found, the linked list would handle insertion before/after, deletion, and iteration. The dictionary would be updated by adding or deleting the relevant key/value pair. Clearly, with this approach the elements must be hashable and unique (or else, we'll need another layer of indirection where each element is represented by an auto-assigned numeric identifier, and only those identifiers are stored as keys).

It seems to me that this would be strictly better in asymptotic complexity than either list or collections.deque, but I may be wrong. [EDIT: Wrong, as pointed out by @roliu. Unlike list or deque, I would not be able to find an element by its numeric index in O(1). As of now, it is O(N) but I am sure there's some way to make it O(log N) if necessary.]


Solution

  • A slightly modified version of Raymond Hettinger's OrderedSet recipe seems to satisfy all my requirements. I only added support for position-based access and read/write.

    # changes vs. original recipe at http://code.activestate.com/recipes/576696/:
    # added a position parameter to add
    # changed how pop works, and added popleft
    # added find, get_start, get_end, next_pos, prev_pos, __getitem__, __setitem__
    
    class OrderedSetPlus(collections.MutableSet, collections.Iterable):
        '''
        >>> oset = OrderedSetPlus([3, 3, 3, 2, 1, 8, 8])
        >>> oset.add(13)
        >>> p = oset.find(2)
        >>> oset.add(15, p)
        >>> oset
        OrderedSetPlus([3, 15, 2, 1, 8, 13])
        >>> p = oset.next_pos(p)
        >>> oset[p]
        1
        >>> oset.add(7, p)
        >>> oset
        OrderedSetPlus([3, 15, 2, 7, 1, 8, 13])
        >>> oset[p] = 20
        >>> oset
        OrderedSetPlus([3, 15, 2, 7, 20, 8, 13])
        '''
    
        class DuplicateElement(Exception):
            pass
    
        def __init__(self, iterable=None):
            self.end = end = [] 
            end += [None, end, end]         # sentinel node for doubly linked list
            self.map = {}                   # key --> [key, prev, next]
            if iterable is not None:
                self |= iterable
    
        def __len__(self):
            return len(self.map)
    
        def __contains__(self, key):
            return key in self.map
    
        def find(self, key):
            return self.map.get(key, None)
    
        # inserts element before the specified position
        # if pos is None, inserts at the end
        # position can only be obtained by calling instance methods
        def add(self, key, pos = None):
            if pos is None:
                pos = self.end
            if key not in self.map:
                curr = pos[PREV]
                curr[NEXT] = pos[PREV] = self.map[key] = [key, curr, pos]
    
        def discard(self, key):
            if key in self.map:        
                key, prev, next = self.map.pop(key)
                prev[NEXT] = next
                next[PREV] = prev
    
        def __iter__(self):
            end = self.end
            curr = end[NEXT]
            while curr is not end:
                yield curr[KEY]
                curr = curr[NEXT]
    
        def get_end(self):
            return self.end[PREV]
    
        def get_start(self):
            return self.end[NEXT]
    
        def next_pos(self, pos):
            pos = pos[NEXT]
            return None if pos is self.end else pos
    
        def prev_pos(self, pos):
            pos = pos[PREV]
            return None if pos is self.end else pos
    
        def __getitem__(self, pos):
            return pos[KEY]
    
        def __setitem__(self, pos, key):
            if key in self.map:
                raise DuplicateElement
            pos[KEY] = key
    
        def __reversed__(self):
            end = self.end
            curr = end[PREV]
            while curr is not end:
                yield curr[KEY]
                curr = curr[PREV]
    
        def popleft(self):
            return self.pop(pos = self.get_start())
    
    
        def pop(self, pos=None):
            if not self:
                raise IndexError()
            if pos is None:
                pos = self.get_end()
            key = self[pos]
            #key = next(reversed(self)) if last else next(iter(self))
            self.discard(key)
            return key
    
        def __repr__(self):
            return '{}({})'.format(self.__class__.__name__, list(self))
    
        def __eq__(self, other):
            if isinstance(other, OrderedSet):
                return len(self) == len(other) and list(self) == list(other)
            return set(self) == set(other)