Search code examples
pythonpython-internals

Why is copying a shuffled list much slower?


Copying a shuffled range(10**6) list ten times takes me about 0.18 seconds: (these are five runs)

0.175597017661
0.173731403198
0.178601711594
0.180330912952
0.180811964451

Copying the unshuffled list ten times takes me about 0.05 seconds:

0.058402235973
0.0505464636856
0.0509734306934
0.0526022752744
0.0513324916184

Here's my testing code:

from timeit import timeit
import random

a = range(10**6)
random.shuffle(a)    # Remove this for the second test.
a = list(a)          # Just an attempt to "normalize" the list.
for _ in range(5):
    print timeit(lambda: list(a), number=10)

I also tried copying with a[:], the results were similar (i.e., big speed difference)

Why the big speed difference? I know and understand the speed difference in the famous Why is it faster to process a sorted array than an unsorted array? example, but here my processing has no decisions. It's just blindly copying the references inside the list, no?

I'm using Python 2.7.12 on Windows 10.

Edit: Tried Python 3.5.2 as well now, the results were almost the same (shuffled consistently around 0.17 seconds, unshuffled consistently around 0.05 seconds). Here's the code for that:

a = list(range(10**6))
random.shuffle(a)
a = list(a)
for _ in range(5):
    print(timeit(lambda: list(a), number=10))

Solution

  • The interesting bit is that it depends on the order in which the integers are first created. For example instead of shuffle create a random sequence with random.randint:

    from timeit import timeit
    import random
    
    a = [random.randint(0, 10**6) for _ in range(10**6)]
    for _ in range(5):
        print(timeit(lambda: list(a), number=10))
    

    This is as fast as copying your list(range(10**6)) (first and fast example).

    However when you shuffle - then your integers aren't in the order they were first created anymore, that's what makes it slow.

    A quick intermezzo:

    • All Python objects are on the heap, so every object is a pointer.
    • Copying a list is a shallow operation.
    • However Python uses reference counting so when an object is put in a new container it's reference count must be incremented (Py_INCREF in list_slice), so Python really needs to go to where the object is. It can't just copy the reference.

    So when you copy your list you get each item of that list and put it "as is" in the new list. When your next item was created shortly after the current one there is a good chance (no guarantee!) that it's saved next to it on the heap.

    Let's assume that whenever your computer loads an item in the cache it also loads the x next-in-memory items (cache locality). Then your computer can perform the reference count increment for x+1 items on the same cache!

    With the shuffled sequence it still loads the next-in-memory items but these aren't the ones next-in-list. So it can't perform the reference-count increment without "really" looking for the next item.

    TL;DR: The actual speed depends on what happened before the copy: in what order were these items created and in what order are these in the list.


    You can verify this by looking at the id:

    CPython implementation detail: This is the address of the object in memory.

    a = list(range(10**6, 10**6+100))
    for item in a:
        print(id(item))
    

    Just to show a short excerpt:

    1496489995888
    1496489995920  # +32
    1496489995952  # +32
    1496489995984  # +32
    1496489996016  # +32
    1496489996048  # +32
    1496489996080  # +32
    1496489996112
    1496489996144
    1496489996176
    1496489996208
    1496489996240
    1496507297840
    1496507297872
    1496507297904
    1496507297936
    1496507297968
    1496507298000
    1496507298032
    1496507298064
    1496507298096
    1496507298128
    1496507298160
    1496507298192
    

    So these objects are really "next to each other on the heap". With shuffle they aren't:

    import random
    a = list(range(10**6, 100+10**6))
    random.shuffle(a)
    last = None
    for item in a:
        if last is not None:
            print('diff', id(item) - id(last))
        last = item
    

    Which shows these are not really next to each other in memory:

    diff 736
    diff -64
    diff -17291008
    diff -128
    diff 288
    diff -224
    diff 17292032
    diff -1312
    diff 1088
    diff -17292384
    diff 17291072
    diff 608
    diff -17290848
    diff 17289856
    diff 928
    diff -672
    diff 864
    diff -17290816
    diff -128
    diff -96
    diff 17291552
    diff -192
    diff 96
    diff -17291904
    diff 17291680
    diff -1152
    diff 896
    diff -17290528
    diff 17290816
    diff -992
    diff 448
    

    Important note:

    I haven't thought this up myself. Most of the informations can be found in the blogpost of Ricky Stewart.

    This answer is based on the "official" CPython implementation of Python. The details in other implementations (Jython, PyPy, IronPython, ...) may be different. Thanks @JörgWMittag for pointing this out.