Search code examples
pythondictionarypython-dataclasses

Is DataClass a good fit to replace a dictionary?


I use dictionaries as data structure a lot in my code. Instead of returning several value as Tuple like Python permits it :

def do_smth():
  [...]
  return val1, val2, val3

I prefer to use a dictionary with the advantage to have named keys. But with complex nested dictionary it's hard to navigate inside it. When I was coding with JS several years ago I liked dictionary too because I could call sub part like thing.stuff.foo and the IDE helped me with the structure.

I just discover the new DataClass in python and I'm not sure about the reason of this except to replace a dictionary ? For what I have read a DataClass cannot have function inside and the initialization of its arguments is simplified.

I would like to have comments about this, how do you use a DataClass, or about dictionary in python.


Solution

  • Dataclasses are more of a replacement for NamedTuples, then dictionaries.

    Whilst NamedTuples are designed to be immutable, dataclasses can offer that functionality by setting frozen=True in the decorator, but provide much more flexibility overall.

    If you are into type hints in your Python code, they really come into play.

    The other advantage is like you said - complex nested dictionaries. You can define Dataclasses as your types, and represent them within Dataclasses in a clear and concise way.

    Consider the following:

    @dataclass
    class City:
        code: str
        population: int
    
    
    @dataclass
    class Country:
       code: str
       currency: str
       cities: List[City]
    
    
    @dataclass
    class Locations:
       countries: List[Country]
        
    

    You can then write functions where you annotate the function param with dataclass name as a type hint and access it's attributes (similar to passing in a dictionary and accessing it's keys), or alternatively construct the dataclass and output it i.e.

    def get_locations(....) -> Locations:
    ....
    

    It makes the code very readable as opposed a large complicated dictionary.

    You can also set defaults, which is not something that is (edit: WAS prior to 3.7) not allowed in NamedTuples but is allowed in dictionaries.

    @dataclass
    class Stock:
       quantity: int = 0
    

    You can also control whether you want the dataclass to be ordered etc in the decorator just like whether want it to be frozen, whereas normal dictionaries are not ordered (edit: WAS prior to 3.7). See here for more information

    You get all the benefits of object comparison if you want them i.e. __eq__() etc. They also by default come with __init__ and __repr__ so you don't have to type out those methods manually like with normal classes.

    There is also substantially more control over fields, allowing metadata etc.

    And lastly you can convert it into a dictionary at the end by importing from dataclasses import dataclass asdict

    Update (Aug 2023): Thanks for the comments! Have edited to clarify those features from 3.7 that I misrepresented. Also wanted to add some further information whilst I'm here:

    For what I have read a DataClass cannot have function inside and the initialization of its arguments is simplified.

    Just a note... You can bind methods to a dataclass and by default __init__ is constructed for you but I believe this can be disabled using @dataclass(init=False) which will give the ability to construct the object and then modify the attribute (my_var = MyClass(); my_var.my_field = 42. However I have found the __post_init__ method very handy, and there is the ability to suspend a specific attribute from automatically initialising to give more control i.e. from the docs

    @dataclass
    class C:
        a: float
        b: float
        c: float = field(init=False)
    
        def __post_init__(self):
            self.c = self.a + self.b
    

    Another useful aspect to the __post_init__ is to make assertions of the value. Type checking on init is performed only to evaluate whether any Class Variables are defined, as they are excluded as fields but can be leveraged by internal methods i.e.

    from typing import ClassVar
    
    @dataclass
    class Lamp:
        valid_sockets: ClassVar[set] = { 'edison_screw', 'bayonet' }
        valid_min_wattage: ClassVar[int] = 40
        valid_max_wattage: ClassVar[int] = 200
        height_cm: int
        socket: str
        wattage: int
        
        def __post_init__(self) -> None:
            assert self._is_valid_wattage(), f'Lamp requires {self.valid_min_wattage}-{self.valid_max_wattage}W bulb'
            assert self._is_valid_socket(), f'Bulb must be one of {self.valid_sockets}'
            
        def _is_valid_socket(self) -> bool:
            return self.socket.lower() in self.valid_sockets
    
        def _is_valid_wattage(self) -> bool:
            return (self.wattage > self.valid_min_wattage) and ( self.wattage < self.valid_max_wattage)
    
    In [27]: l = Lamp(50, 'bayonet', 80)
    In [28]: print(repr(l))
    Lamp(height_cm=50, socket='bayonet', wattage=80)
    In [29]: l = Lamp(50, 'bayonet', 300)
    ---------------------------------------------------------------------------
    AssertionError                            Traceback (most recent call last)
    Cell In [29], line 1
    ----> 1 l = Lamp(50, 'bayonet', 300)
    
    File <string>:6, in __init__(self, height_cm, socket, wattage)
    
    Cell In [25], line 11, in Lamp.__post_init__(self)
         10 def __post_init__(self) -> None:
    ---> 11     assert self._is_valid_wattage(), f'Lamp requires {self.valid_min_wattage}-{self.valid_max_wattage}W bulb'
         12     assert self._is_valid_socket(), f'Bulb must be one of {self.valid_sockets}'
    
    AssertionError: Lamp requires 40-200W bulb