Search code examples
pythonpython-3.xbooleanpython-dataclasses

Why is my Python dataclass not initialising boolean correctly?


I'm currently writing some codes for an option pricer and at the same time I've been trying to experiment with Python dataclasses. Here I've two classes, Option() and Option2(), with the former written in dataclass syntax and latter in conventional class syntax.

from dataclasses import dataclass, field
from typing import Optional

@dataclass
class Option:
    is_american: Optional[bool] = field(default=False)
    is_european: Optional[bool] = not is_american

class Option2:
    def __init__(is_american=False):
        self.is_european = not is_american

if __name__ == "__main__":
    eu_option1 = Option()
    print(f"{eu_option1.is_european = }")
    
    eu_option2 = Option2()
    print(f"{eu_option2.is_european = }")

The output gives

eu_option1.is_european = False
eu_option2.is_european = True

However, something very strange happened. Notice how in the Option2() case, is_american is set to False by default, and hence is_european must be True and it indeed is, so this is expected behaviour.

But in the dataclass Option() case, is_american is also set to False by default. However, for whatever reason, the dataclass did not trigger the is_european: Optional[bool] = not is_american and hence is_european is still False when it is supposed to be True.

What is going on here? Did I use my dataclass incorrectly?


Solution

  • It is likely that the dataclass constructor is struggling with the order of statements. Normally you'd have all the mandatory parameters before any optional ones for example, and it may not realise at construct time that the value is meant to be false.

    There is a built-in mechanism to make sure that fields which are dependent on other fields are processed in the correct order. What you need to do is flag your secondary code as init=False and move them over to a __post_init__() method.

    from dataclasses import dataclass, field
    from typing import Optional, List
    
    @dataclass
    class Option:
        is_american: Optional[bool] = field(default=False)
        is_european: Optional[bool] = field(init=False)
    
        def __post_init__():
            self.is_european = not self.is_american
    

    Personally I'd get rid of is_european altogether and use a get() to fetch the value if it's called. There's no need to hold the extra value if it's always going to be directly related to another value. Just calculate it on the fly when it's called.

    With many languages, you wouldn't access attributes directly, you'd access them through control functions (get, set, etc) like get_is_american() or get_country(). Python has an excellent way of handling this through decorators. This allows the use of direct access when first setting up a class, then moving to managed access without having the change the code calling the attribute by using the @property decorator. Examples:

    # change the is_american to _is_american to stop direct access
    
    # Get is the default action, therefore does not need to be specified
    @property
    def is_american(self):
        return self._is_american
    
    @property
    def is_european(self):
        return not self._is_american
    
    # Allow value to be set
    @property.setter
    def is_american(self, america_based: bool):
        self._is_american = america_based
    
    @property.setter
    def is_european(self, europe_based: bool):
        self._is_american = not europe_based
    
    

    This could then be called as follows:

    print(my_object.is_american)
    my_object.is_american = false
    print(my_object.is_european)
    

    Did you see how flexible that approach is? If you have more countries that US or European, or if you think the process might expand, you can change the storage to a string or an enum and define the return values using the accessor. Example:

    # Imagine country is now a string
    @property
    def is_american(self):
        if self.country == 'US':
            return true
        else:
            return false
    
    @property
    def is_european(self):
        if self.country == 'EU':
            return true
        else:
            return false
    
    @property
    def country(self):
        return self._country
    
    @property.setter
    def country(self, new_country: str):
        self._country = new_country
    
    @property.setter
    def is_american(self, america_check: bool):
        if america_check:
            self._country = "US"
        else:
            self._country = "EU"
    
    @property.setter
    def is_european(self, europe_check: bool):
        if europe_check:
            self._country = "EU"
        else:
            self._country = "US"
    

    Notice how, if you already have existing code that calls is_american, none of the accessing code has to be changed even though country is now stored - and available as - a string.