I have a dataclass and I want to iterate over in in a loop to spit out each of the values. I'm able to write a very short __iter__()
within it easy enough, but is that what I should be doing? I don't see anything in the documentation about an 'iterable' parameter or anything, but I just feel like there ought to be...
Here is what I have which, again, works fine.
from dataclasses import dataclass
@dataclass
class MyDataClass:
a: float
b: float
c: float
def __iter__(self):
for value in self.__dict__.values():
yield value
thing = MyDataclass(1,2,3)
for i in thing:
print(i)
# outputs 1,2,3 on separate lines, as expected
Is this the best / most direct way to do this?
The simplest approach is probably to make a iteratively extract the fields following the guidance in the dataclasses.astuple
function for creating a shallow copy, just omitting the call to tuple
(to leave it a generator expression, which is a legal iterator for __iter__
to return:
def __iter__(self):
return (getattr(self, field.name) for field in dataclasses.fields(self))
# Or writing it directly as a generator itself instead of returning a genexpr:
def __iter__(self):
for field in dataclasses.fields(self):
yield getattr(self, field.name)
Unfortunately, astuple
itself is not suitable (as it recurses, unpacking nested dataclasses and structures), while asdict
(followed by a .values()
call on the result), while suitable, involves eagerly constructing a temporary dict
and recursively copying the contents, which is relatively heavyweight (memory-wise and CPU-wise); better to avoid unnecessary O(n)
eager work.
asdict
would be suitable if you want/need to avoid using live views (if later attributes of the instance are replaced/modified midway through iterating, asdict
wouldn't change, since it actually guarantees they're deep copied up-front, while the genexpr would reflect the newer values when you reached them). The implementation using asdict
is even simpler (if slower, due to the eager pre-deep copy):
def __iter__(self):
yield from dataclasses.asdict(self).values()
# or avoiding a generator function:
def __iter__(self):
return iter(dataclasses.asdict(self).values())
There is a third option, which is to ditch dataclasses
entirely. If you're okay with making your class behave like an immutable sequence, then you get iterability for free by making it a typing.NamedTuple
(or the older, less flexible collections.namedtuple
) instead, e.g.:
from typing import NamedTuple
class MyNotADataClass(NamedTuple):
a: float
b: float
c: float
thing = MyNotADataClass(1,2,3)
for i in thing:
print(i)
# outputs 1,2,3 on separate lines, as expected
and that is iterable automatically (you can also call len
on it, index it, or slice it, because it's an actual subclass of tuple
with all the tuple
behaviors, it just also exposes its contents via named properties as well).