Search code examples
pythondictionarypython-3.7python-dataclasses

Python 3 - Which one is faster for accessing data: dataclasses or dictionaries?


Python 3.7 introduced dataclasses to store data. I'm considering to move to this new approach which is more organized and well structured than a dict.

But I have a doubt. Python transforms keys into hashes on dicts and that makes looking for keys and values much faster. Dataclasses implement something like it?

Which one is faster and why?


Solution

  • All classes in python actually use a dictionary under the hood to store their attributes, as you can read here in the documentation. For a more in-depth reference on how python classes (and many more things) work, you can also check out the article on python's datamodel, in particular the section on custom classes.

    So in general, there shouldn't be a loss in performance by moving from dictionaries to dataclasses. But it's better to make sure with the timeit module:


    Baseline

    # dictionary creation
    $ python -m timeit "{'var': 1}"
    5000000 loops, best of 5: 52.9 nsec per loop
    
    # dictionary key access
    $ python -m timeit -s "d = {'var': 1}" "d['var']"
    10000000 loops, best of 5: 20.3 nsec per loop
    

    Basic dataclass

    # dataclass creation
    $ python -m timeit -s "from dataclasses import dataclass" -s "@dataclass" -s "class A: var: int" "A(1)" 
    1000000 loops, best of 5: 288 nsec per loop
    
    # dataclass attribute access
    $ python -m timeit -s "from dataclasses import dataclass" -s "@dataclass" -s "class A: var: int" -s "a = A(1)" "a.var" 
    10000000 loops, best of 5: 25.3 nsec per loop
    

    Here we can see that using classes does have some overhead. For class creation it's quite a bit (~5 times slower), but you don't necessarily need to care that much about it as long as you don't plan to create and toss your dataclasses multiple times per second.

    The attribute access is probably the more important metric, and while dataclasses are again slower (~1.25 times), this time it's not by that much.

    If you think that's still a tad too slow, you can tune your dataclass (or any classes, really) by using slots instead of a dictionary to store their attributes:


    Slotted dataclass

    # dataclass creation
    $ python -m timeit -s "from dataclasses import dataclass" -s "@dataclass" -s "class A: __slots__ = ('var',); var: int" "A(1)" 
    1000000 loops, best of 5: 242 nsec per loop
    
    # dataclass attribute access
    $ python -m timeit -s "from dataclasses import dataclass" -s "@dataclass" -s "class A: __slots__ = ('var',); var: int" -s "a = A(1)" "a.var"
    10000000 loops, best of 5: 21.7 nsec per loop
    

    By using this pattern we could shave off a few more more nanoseconds. At this point, at least regarding attribute access, there shouldn't be a noticeable difference to dictionaries any more, and you can use the upsides of dataclasses without compromising speed.