Search code examples
pythonsentinel

What is the point of "sentinel object" pattern in Python


I recently learned about the "sentinel object" pattern in python. I was taken by it, and started to use it wherever I could. However, after using it someplace where it isn't needed, a coworker asked me about it. Now, I can't see the use of it, given that "x in dict" exists. Here is a (truncated) canonical example, from the functools LRU cache library:

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
    # Constants shared by all lru cache instances:
    sentinel = object()          # unique object used to signal cache misses
    make_key = _make_key         # build a key from the function arguments
    PREV, NEXT, KEY, RESULT = 0, 1, 2, 3   # names for the link fields

    cache = {}
    hits = misses = 0
    full = False
    cache_get = cache.get    # bound method to lookup a key or return None
    cache_len = cache.__len__  # get cache size without calling len()
    lock = RLock()           # because linkedlist updates aren't threadsafe
    root = []                # root of the circular doubly linked list
    root[:] = [root, root, None, None]     # initialize by pointing to self

    if maxsize == 0:

        def wrapper(*args, **kwds):
            # No caching -- just a statistics update after a successful call
            nonlocal misses
            result = user_function(*args, **kwds)
            misses += 1
            return result

    elif maxsize is None:

        def wrapper(*args, **kwds):
            # Simple caching without ordering or size limit
            nonlocal hits, misses
            key = make_key(args, kwds, typed)
            result = cache_get(key, sentinel)
            if result is not sentinel:
                hits += 1
                return result
            result = user_function(*args, **kwds)
            cache[key] = result
            misses += 1
            return result

Now, just focusing on the part where the pattern is used:

            result = cache_get(key, sentinel)                    
            if result is not sentinel:                           
                hits += 1                                        
                return result                                    
            result = user_function(*args, **kwds)                
            cache[key] = result                                  
            misses += 1                                          
            return result 

As far as I can tell, this can be rewritten the following way:

            if key not in cache:
                result = user_function(*args, **kwds)
                cache[key] = result
                misses += 1
            else:
                result = cache_get(key)
                hits += 1
            return result

I wondered: What is the benefit of this sentinel method? I thought it might be efficiency. The python wiki says that "x in s" is O(n) average case, while get item is O(1) average case. But does this actually make a practical time difference?

I ran some quick tests on my laptop, and the runtimes are close, both in scenarios where most keys are hits and where most keys are misses.

In reply to @martineau, I don't think there is any additional functionality we get from this pattern, as demonstrated by this interactive session:

>>> d={1:None}
>>> if 1 in d:
...     print('one is there')
... 
one is there
>>> if 2 in d:
...     print('two is not')
... 
>>> d={1:None,None:3}
>>> if None in d:
...     print('we can find a none key as well')
... 
we can find a none key as well

So, the question remains: What is the point of this pattern?


Solution

  • In the code you show, using dict.get with a sentinel value is a small optimization for the case where the key does exist in the dictionary. In that case, you only need to go through the hashing and key lookup process once, in the get call, not twice as you would need in an if key in dict: value = dict[key] equivalent.

    This doesn't change the computational complexity, since dictionary indexing and membership testing are both O(1), but even small performance improvements can be important if they're in "hot" code that gets run very often. And that's exactly where memoization like that provided by like the code you show is most useful!

    There are some other pretty common micro-optimizations you may see in some Python code in the standard library. Your example contains another one, saving a bound method (cache.get) to a local variable (cache_get). This lets the code avoid rebinding the method each time it is needed, which involves indexing into instance and class dictionaries and creating a bound method object.