I'm studying algorithms and data structures at the moment.
I thought I would run a quick timeit.timeit
test for iterating through a list of 2**30 random integers in a list()
comparing to the same for the array.array
format.
I was expecting the array to finish first as one of the few muted benefits I have seen on other posts with a Python array is performance (I was initially wrongly under the impression that the list was implemented as a linked list: thank you for the correction Duncan)
Surely an array should be at least as quick as a list?
import os
import array
l = list(os.urandom(2**30))
a = array.array('I', l)
def test_list():
for i in l:
pass
def test_array():
for i in a:
pass
>>> timeit.timeit(test_array, number=5)
50.08525877200009
>>> timeit.timeit(test_list, number=5)
37.00491460799958
Here's my platform information: Python 3.6.5, [GCC 7.3.0] on linux x86_64 (Intel i5 4660)
First you initialise l
to a list of 2**30 Python int
values.
Second you initialise a
from the list to create a list of 2**30 C integers.
test_list
iterates over the list of Python int
values. No Python objects are created or destroyed in this process, just a reference counter on each one gets incremented and then decremented.
test_array
iterates over the list of C integers creating a new Python int
for each element and then destroying it again. That's why the array is slower: it creates and destroys 2**30 Python objects.
Internally a Python list is just an array of pointers to the objects it contains. That means iterating over the list is as simple and as fast as iterating over an array. The array
type here will be using less memory overall (or it would be if you hadn't held on to the list) as C integers are much smaller than Python objects, but each access into the array has to convert the C value into a Python object and while object creation is heavily optimised it still takes more time than just getting another reference to an existing object.