Search code examples
pythonpython-3.xreentrancy

Why declare a list twice in __init__?


I'm reading through the Python Docs, and, under Section 8.4.1, I found the following __init__ definition (abbreviated):

class ListBasedSet(collections.abc.Set):
    ''' Alternate set implementation favoring space over speed
        and not requiring the set elements to be hashable. '''
    def __init__(self, iterable):
        self.elements = lst = []
        for value in iterable:
            if value not in lst:
                lst.append(value)

The part I don't get is the self.elements = lst = [] line. Why the double assignment?

Adding some print statements:

def __init__(self, iterable):
    self.elements = lst = []
    print('elements id:', id(self.elements))
    print('lst id:', id(lst))
    for value in iterable:
        if value not in lst:
            lst.append(value)

Declaring one:

ListBasedSet(range(3))
elements id: 4741984136
lst id: 4741984136
Out[36]: <__main__.ListBasedSet at 0x11ab12fd0>

As expected, they both point to the same PyObject.

Is brevity the only reason to do something like this? If not, why? Something to do with reentrancy?


Solution

  • I'd call this a case of premature optimization; you don't save that much by eliminating the dot, especially for large input iterables; here's some timings:

    Eliminating the dot:

    %timeit ListBasedSet(range(3))
    The slowest run took 4.06 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 2.05 µs per loop
    
    %timeit ListBasedSet(range(30))
    100000 loops, best of 3: 18.5 µs per loop
    
    %timeit ListBasedSet(range(3000))
    10 loops, best of 3: 119 ms per loop
    

    While, with the dot (i.e replace lst with self.elements:

    %timeit ListBasedSet(range(3))
    The slowest run took 5.97 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 2.48 µs per loop
    
    %timeit ListBasedSet(range(30))
    10000 loops, best of 3: 22.8 µs per loop
    
    %timeit ListBasedSet(range(3000))
    10 loops, best of 3: 118 ms per loop
    

    As you can see, as we increase the size of the input iterable, the difference in time pretty much disappears, the appending and membership testing pretty much cover any gains.