I would like to create a typed DataFrame from a Pydantic BaseModel class, let's call it MyModel that has Optional fields. As I create multiple instances of MyModel, some will have Optional fields with None values, and if I initialize a DataFrame with such rows, they will may have inconsistent column dtypes. I'd like thus to cast Optional[TypeX]
to TypeX
, e.g.:
import pydantic
import pandas as pd
import numpy as np
from typing import Optional
class MyModel(pydantic.BaseModel):
thisfield: int
thatfield: Optional[str]
...
col_types = {kk: ff.annotation for kk, ff in MyModel.model_fields.items()}
pd.DataFrame(np.empty(0, dtype=[tuple(tt) for tt in col_types.items()]))
This fails with TypeError: Cannot interpret 'typing.Optional[str]' as a data type
.
I need a function or method of Optional[X] -> X
. Any suggestions other than using repr
with regex?
As long as Optional[X]
is equivalent to Union[X, None]
:
from typing import Union, get_args, get_origin
def get_optional_arg(typ: type) -> type | None:
# make sure typ is really Optional[...], otherwise return None
if get_origin(typ) is Union:
args = get_args(typ)
if len(args) == 2 and args[1] is type(None):
return args[0]
col_types = {
k: get_optional_arg(f.annotation) or f.annotation
for k, f in MyModel.model_fields.items()
}