TL,DR: I've created a dataclass, where not all fields are defined in the init phase, some get added in __post_init__
in relation to an InitVar. When I print the class object, only fields in init gets printed but not those added in __post_init__
. This is also true if I convert the object to a pandas dataframe (which is what I ultimately would like to achieve). How do I need to change the class code or the print/pandas statements to get all dataclass fields/values?
This is an extract of my dataclass definition:
@dataclass
class PcpCompound:
compound: InitVar[Compound] # Compound: class of the pubchempy package
query_status: str
query_term: str
def __post_init__(self, compound: Compound | None):
if compound is None:
return
self.query_finding: str = compound.iupac_name
Results and expectations are shown down below.
My optimal solution would be to move everything from __post_init__
to init, but that would require that I check compound
for None and write all the wanted details from it to their respective fields of the dataclass. Unfortunately, I couldn't figure out how to use compound
programmaticly in init ... and I believe that this is intended.
If this is not possible, how would I solve this? I would prefer to not define all fields in init, as this is not only tedious and doubles the code amount, but would lead to a lot of 'None' if compound is itself None.
The calls to construct an object and to print/convert to a daraframe:
PcpCompound = PcpCompound(query_status=status, query_term=query_term, compound=compound)
print(PcpCompound)
PdfCompound = pd.DataFrame([PcpCompound])
print(PdfCompound)
Actual output:
PcpCompound(query_status='Success!', query_term='someterm')
query_status query_term
0 Success! 110-89-4
Expected output:
PcpCompound(query_status='Success!', query_term='110-89-4', query_finding='Piperidine')
query_status query_term query_finding
0 Success! 110-89-4 Piperidine
Note: All expected values are correctly present in the object, as shown in the VSCode debug view. This can also be verified by print(PcpCompound.query_finding)
which returns Piperidine
as expected.
The reason why query_finding
isn't included in the print
output or in the DataFrame is because dataclass
simply isn't aware of your new property, you never told dataclass
about it (and by extension, pandas).
The solution is to declare your field in the class by using field(init=False)
to indicate the field cannot be included directly to __init__
and is instead created in __post_init__
:
from dataclasses import InitVar, dataclass, field
import pandas as pd
@dataclass
class PcpCompound:
compound: InitVar[int] # using `int` for reproducibility
query_status: str
query_term: str
query_finding: str = field(init=False)
def __post_init__(self, compound: int) -> None:
self.query_finding = compound * 2 # pretend this is a real query
c = PcpCompound(compound=123, query_status="Success!", query_term="someterm")
print(c)
print(pd.DataFrame([c]))
Output:
PcpCompound(query_status='Success!', query_term='someterm', query_finding=246)
query_status query_term query_finding
0 Success! someterm 246