Suppose we have a dictionary d
defined as follows
d = {i:i for i in range(1,1000001)}
And then we store d.items()
in x
. So, x
is a collection of tuples that contain key-value pairs for each element of d
. I made a list with the same tuples, defined as follows :
l = [(i, i) for i in range(1,1000001)]
Now, I compared the time taken to execute (1000001, 1000001) in x
and (1000001, 1000001) in l
. The time difference was very significant. I assumed that this was because x
had a set-like behavior and implemented hash tables on tuples, as tuples are hashable as well.
However, tuples that contain lists are non hashable, so I tried the same thing with this dictionary :
d = {i:[i,i+1] for i in range(1,1000001)}
Unlike what I expected, it still enabled fast lookup.
How does this work?
PS : Here's my code for time comparison
import time
def for_dict_items(n):
#d = {i:i for i in range(1, n+1)}
d = {i:[i, i+1] for i in range(1, n+1)}
di = d.items()
st = time.time()
x = (n, [n, n+1]) in di
et = time.time()
return (et - st)
def for_tuples_list(n):
#l = [(i, i) for i in range(1, n+1)]
l = [(i,[i, i+1]) for i in range(1, n+1)]
st = time.time()
x = (n, [n, n+1]) in l
et = time.time()
return (et - st)
k = 1000000
t1 = for_dict_items(k)
t2 = for_tuples_list(k)
print(t1, t2, t2/t1, sep = "\n")
Since dict is one of the most important core objects, it's implemented in C, as well as dict_values
, dict_keys
and dict_items
. Here's the CPython implementation of dictitems_contains
:
static int
dictitems_contains(PyObject *self, PyObject *obj)
{
_PyDictViewObject *dv = (_PyDictViewObject *)self;
int result;
PyObject *key, *value, *found;
if (dv->dv_dict == NULL)
return 0;
if (!PyTuple_Check(obj) || PyTuple_GET_SIZE(obj) != 2)
return 0;
key = PyTuple_GET_ITEM(obj, 0);
value = PyTuple_GET_ITEM(obj, 1);
result = PyDict_GetItemRef((PyObject *)dv->dv_dict, key, &found);
if (result == 1) {
result = PyObject_RichCompareBool(found, value, Py_EQ);
Py_DECREF(found);
}
return result;
}
If you don't speak C, here's what it does:
The core of this is the fact that dict.items
returns not a list, but a special dict_items
view. The term view here means that it does not represent a copy, but only a proxy to retrieve underlying items in some specific way. The same applies to dict_keys
and dict_values
, you can read more in the excellent documentation page on this topic.
Note that this is specific to CPython implementation and may be wrong for other implementations.