Search code examples
pythonlistdictionaryhashable

Efficiently mapping unhashable objects to their index in a list


A Python list

f = [x0, x1, x2]

may be seen as an efficient representation of a mapping from [0, 1, ..., len(f) - 1] to the set of its elements. By "efficient" I mean that f[i] returns the element associated with i in O(1) time.

The inverse mapping may be defined as follows:

class Inverse:
    def __init__(self, f):
        self.f = f

    def __getitem__(self, x):
        return self.f.index(x)

This works, but Inverse(f)[x] takes O(n) time on average.

Alternatively, one may use a dict:

f_inv = {x: i for i, x in enumerate(f)}

This has O(1) average time complexity, but it requires the objects in the list to be hashable.

Is there a way to define an inverse mapping that provides equality-based lookups, in O(1) average time, with unhashable objects?

Edit: sample input and expected output:

>>> f = [x0, x1, x2]
>>> f_inv = Inverse(f)  # this is to be defined
>>> f_inv[x0]  # in O(1) time
0
>>> f_inv[x2]  # in O(1) time
2

Solution

  • You can create an associated dictionary mapping the object ID's back to the list index.

    The obvious disadvantage is that you will have to search the index for the identity object, not for on eobject that is merely equal.

    On the upside, by creating a custom MutableSequence class using collections.abc, you can, with minimal code, write a class that keeps your data both as a sequence and as the reverse dictionary.

    from collections.abc import MutableSequence
    from threading import RLock
    
    
    class MD(dict):
        # No need for a full MutableMapping subclass, as the use is limited
        def __getitem__(self, key):
            return super().__getitem__(id(key))
    
    
    class Reversible(MutableSequence):
        def __init__(self, args):
            self.seq = list()
            self.reverse = MD()
            self.lock = RLock()
            for element in args:
                self.append(element)
    
        def __getitem__(self, index):
            return self.seq[index]
    
        def __setitem__(self, index, value):
            with self.lock:
                del self.reverse[id(self.seq[index])]
                self.seq[index] = value
                self.reverse[id(value)] = index
    
        def __delitem__(self, index):
            if index < 0:
                index += len(self)
            with self.lock:
                # Increase all mapped indexes
                for obj in self.seq[index:]:
                    self.reverse[obj] -= 1
                del self.reverse[id(self.seq[index])]
                del self.seq[index]
    
        def __len__(self):
            return len(self.seq)
    
        def insert(self, index, value):
            if index < 0:
                index += len(self)
            with self.lock:
                # Increase all mapped indexes
                for obj in self.seq[index:]:
                    self.reverse[obj] += 1
                self.seq.insert(index, value)
                self.reverse[id(value)] = index
    

    And voilá: just use this object in place of your list, and the public attribute "reverse" to get the index of identity objects. Perceive you can augment the "intelligence" of the "MD" class by trying to use different strategies, like to use the objects themselves, if they are hashable, and only resort to id, or other custom key based on other object attributes, when needed. That way you could mitigate the need for the search to be for the same object.

    So, for ordinary operations on the list, this class maintain the reverted dictionary synchronized. There is no support for slice indexing, though. For more information, check the docs at https://docs.python.org/3/library/collections.abc.html