I'm currently writing some codes for an option pricer and at the same time I've been trying to experiment with Python dataclasses. Here I've two classes, Option()
and Option2()
, with the former written in dataclass syntax and latter in conventional class syntax.
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class Option:
is_american: Optional[bool] = field(default=False)
is_european: Optional[bool] = not is_american
class Option2:
def __init__(is_american=False):
self.is_european = not is_american
if __name__ == "__main__":
eu_option1 = Option()
print(f"{eu_option1.is_european = }")
eu_option2 = Option2()
print(f"{eu_option2.is_european = }")
The output gives
eu_option1.is_european = False
eu_option2.is_european = True
However, something very strange happened. Notice how in the Option2()
case, is_american
is set to False
by default, and hence is_european
must be True
and it indeed is, so this is expected behaviour.
But in the dataclass
Option()
case, is_american
is also set to False
by default. However, for whatever reason, the dataclass did not trigger the is_european: Optional[bool] = not is_american
and hence is_european
is still False
when it is supposed to be True
.
What is going on here? Did I use my dataclass incorrectly?
It is likely that the dataclass constructor is struggling with the order of statements. Normally you'd have all the mandatory parameters before any optional ones for example, and it may not realise at construct time that the value is meant to be false.
There is a built-in mechanism to make sure that fields which are dependent on other fields are processed in the correct order. What you need to do is flag your secondary code as init=False
and move them over to a __post_init__()
method.
from dataclasses import dataclass, field
from typing import Optional, List
@dataclass
class Option:
is_american: Optional[bool] = field(default=False)
is_european: Optional[bool] = field(init=False)
def __post_init__():
self.is_european = not self.is_american
Personally I'd get rid of is_european
altogether and use a get()
to fetch the value if it's called. There's no need to hold the extra value if it's always going to be directly related to another value. Just calculate it on the fly when it's called.
With many languages, you wouldn't access attributes directly, you'd access them through control functions (get, set, etc) like get_is_american() or get_country(). Python has an excellent way of handling this through decorators. This allows the use of direct access when first setting up a class, then moving to managed access without having the change the code calling the attribute by using the @property
decorator. Examples:
# change the is_american to _is_american to stop direct access
# Get is the default action, therefore does not need to be specified
@property
def is_american(self):
return self._is_american
@property
def is_european(self):
return not self._is_american
# Allow value to be set
@property.setter
def is_american(self, america_based: bool):
self._is_american = america_based
@property.setter
def is_european(self, europe_based: bool):
self._is_american = not europe_based
This could then be called as follows:
print(my_object.is_american)
my_object.is_american = false
print(my_object.is_european)
Did you see how flexible that approach is? If you have more countries that US or European, or if you think the process might expand, you can change the storage to a string or an enum and define the return values using the accessor. Example:
# Imagine country is now a string
@property
def is_american(self):
if self.country == 'US':
return true
else:
return false
@property
def is_european(self):
if self.country == 'EU':
return true
else:
return false
@property
def country(self):
return self._country
@property.setter
def country(self, new_country: str):
self._country = new_country
@property.setter
def is_american(self, america_check: bool):
if america_check:
self._country = "US"
else:
self._country = "EU"
@property.setter
def is_european(self, europe_check: bool):
if europe_check:
self._country = "EU"
else:
self._country = "US"
Notice how, if you already have existing code that calls is_american
, none of the accessing code has to be changed even though country is now stored - and available as - a string.