Search code examples
pythongeneratorpython-internals

Dictionary comprehension multiple ways


What would be the difference between the following two statements in python?

l = [1,2,3,4]
a = {item:0 for item in l}
b = dict((item,0) for item in l)
a == b
# True

I believe the first is the proper way to initialize a dictionary via comprehension from PEP, yet the second way seems to just create a generator expression and then create a dict from that (and so maybe it does the exact same thing as the first approach behind the scenes?). What actually would be the difference between the two, and which one should be preferred over the other?


Solution

  • a = {item:0 for item in l}
    

    Directly constructs a dict, no intermediates.

    b = dict((item,0) for item in l)
    

    Generates a tuple for each item in the list and feeds that to the dict() constructor.

    Without really digging into the guts of the resulting Python byte code, I doubt there's an easy way of finding out how exactly they differ. Performance-wise, they are likely to be very close as well.

    The main thing here I would consider is readability and maintainability. The first way only relies on the elements you need, without involving an intermediate data type (tuple) and without directly calling a type, but instead relying on the language itself to hook things up correctly. As a bonus, it's shorter and simpler - I don't see any advantage in using the second option, except maybe for the explicit use of dict, telling others what the expected type is. But if they don't get that from the {} in the first instance, I doubt they're much good anyway...

    I figured I'd test the speed:

    from timeit import timeit
    from random import randint
    
    l = [randint(0, 1000) for _ in range(1000)]
    
    
    def first():
        return {item: 0 for item in l}
    
    
    def second():
        return dict((item,0) for item in l)
    
    
    print(timeit(first, number=10000))
    print(timeit(second, number=10000))
    

    Result:

    0.46899440000000003
    1.0817516999999999
    

    Consistently faster as well, so no need to ever use the second option, it seems. If there's anything surprising here, it's actually how poorly optimised the second example is and how badly it performs.