Search code examples
pythonlistcaching

Why is Python list slower when sorted?


In the following code, I create two lists with the same values: one list unsorted (s_not), the other sorted (s_yes). The values are created by randint(). I run some loop for each list and time it.

import random
import time

for x in range(1,9):

    r = 10**x # do different val for the bound in randint()
    m = int(r/2)

    print("For rand", r)

    # s_not is non sorted list
    s_not = [random.randint(1,r) for i in range(10**7)]

    # s_yes is sorted
    s_yes = sorted(s_not)

    # do some loop over the sorted list
    start = time.time()
    for i in s_yes:
        if i > m:
            _ = 1
        else:
            _ = 1
    end = time.time()
    print("yes", end-start)

    # do the same to the unsorted list
    start = time.time()
    for i in s_not:
        if i > m:
            _ = 1
        else:
            _ = 1
    end = time.time()
    print("not", end-start)

    print()

With output:

For rand 10
yes 1.0437555313110352
not 1.1074268817901611

For rand 100
yes 1.0802974700927734
not 1.1524150371551514

For rand 1000
yes 2.5082249641418457
not 1.129960298538208

For rand 10000
yes 3.145440101623535
not 1.1366300582885742

For rand 100000
yes 3.313387393951416
not 1.1393756866455078

For rand 1000000
yes 3.3180911540985107
not 1.1336982250213623

For rand 10000000
yes 3.3231537342071533
not 1.13503098487854

For rand 100000000
yes 3.311596393585205
not 1.1345293521881104

So, when increasing the bound in the randint(), the loop over the sorted list gets slower. Why?


Solution

  • Cache misses. When N int objects are allocated back-to-back, the memory reserved to hold them tends to be in a contiguous chunk. So crawling over the list in allocation order tends to access the memory holding the ints' values in sequential, contiguous, increasing order too.

    Shuffle it, and the access pattern when crawling over the list is randomized too. Cache misses abound, provided there are enough different int objects that they don't all fit in cache.

    At r==1, and r==2, CPython happens to treat such small ints as singletons, so, e.g., despite that you have 10 million elements in the list, at r==2 it contains only (at most) 100 distinct int objects. All the data for those fit in cache simultaneously.

    Beyond that, though, you're likely to get more, and more, and more distinct int objects. Hardware caches become increasingly useless then when the access pattern is random.

    Illustrating:

    >>> from random import randint, seed
    >>> seed(987987987)
    >>> for x in range(1, 9):
    ...     r = 10 ** x
    ...     js = [randint(1, r) for _ in range(10_000_000)]
    ...     unique = set(map(id, js))
    ...     print(f"{r:12,} {len(unique):12,}")
    ...     
              10           10
             100          100
           1,000    7,440,909
          10,000    9,744,400
         100,000    9,974,838
       1,000,000    9,997,739
      10,000,000    9,999,908
     100,000,000    9,999,998