Search code examples
pythondictionaryhashtable

Why is fast lookup possible for dict.items()?


Suppose we have a dictionary d defined as follows

d = {i:i for i in range(1,1000001)}

And then we store d.items() in x. So, x is a collection of tuples that contain key-value pairs for each element of d. I made a list with the same tuples, defined as follows :

l = [(i, i) for i in range(1,1000001)]

Now, I compared the time taken to execute (1000001, 1000001) in x and (1000001, 1000001) in l. The time difference was very significant. I assumed that this was because x had a set-like behavior and implemented hash tables on tuples, as tuples are hashable as well.

However, tuples that contain lists are non hashable, so I tried the same thing with this dictionary :

d = {i:[i,i+1] for i in range(1,1000001)}

Unlike what I expected, it still enabled fast lookup.

How does this work?

PS : Here's my code for time comparison

import time

def for_dict_items(n):
    #d = {i:i for i in range(1, n+1)}
    d = {i:[i, i+1] for i in range(1, n+1)}
    di = d.items()
    st = time.time()
    x = (n, [n, n+1]) in di
    et = time.time()
    return (et - st)


def for_tuples_list(n):
    #l = [(i, i) for i in range(1, n+1)]
    l = [(i,[i, i+1]) for i in range(1, n+1)]
    st = time.time()
    x = (n, [n, n+1]) in l
    et = time.time()
    return (et - st)


k = 1000000

t1 = for_dict_items(k)
t2 = for_tuples_list(k)

print(t1, t2, t2/t1, sep = "\n")

Solution

  • Since dict is one of the most important core objects, it's implemented in C, as well as dict_values, dict_keys and dict_items. Here's the CPython implementation of dictitems_contains:

    static int
    dictitems_contains(PyObject *self, PyObject *obj)
    {
        _PyDictViewObject *dv = (_PyDictViewObject *)self;
        int result;
        PyObject *key, *value, *found;
        if (dv->dv_dict == NULL)
            return 0;
        if (!PyTuple_Check(obj) || PyTuple_GET_SIZE(obj) != 2)
            return 0;
        key = PyTuple_GET_ITEM(obj, 0);
        value = PyTuple_GET_ITEM(obj, 1);
        result = PyDict_GetItemRef((PyObject *)dv->dv_dict, key, &found);
        if (result == 1) {
            result = PyObject_RichCompareBool(found, value, Py_EQ);
            Py_DECREF(found);
        }
        return result;
    }
    

    If you don't speak C, here's what it does:

    • Check that view is correct - namely that it has a reference to the underlying dictionary (and return False if not, for some reason)
    • Check the input: if it's not a 2-tuple, return False.
    • Look up a key in the underlying dictionary.
    • If found, check if value matches. If it does, return True.
    • Otherwise, return False.

    The core of this is the fact that dict.items returns not a list, but a special dict_items view. The term view here means that it does not represent a copy, but only a proxy to retrieve underlying items in some specific way. The same applies to dict_keys and dict_values, you can read more in the excellent documentation page on this topic.

    Note that this is specific to CPython implementation and may be wrong for other implementations.