Search code examples
pythonpandaspython-typing

getting argument of typing.Optional in python


I would like to create a typed DataFrame from a Pydantic BaseModel class, let's call it MyModel that has Optional fields. As I create multiple instances of MyModel, some will have Optional fields with None values, and if I initialize a DataFrame with such rows, they will may have inconsistent column dtypes. I'd like thus to cast Optional[TypeX] to TypeX, e.g.:

import pydantic
import pandas as pd
import numpy as np
from typing import Optional

class MyModel(pydantic.BaseModel):
   thisfield: int
   thatfield: Optional[str]
   ...

col_types = {kk: ff.annotation for kk, ff in MyModel.model_fields.items()}


pd.DataFrame(np.empty(0, dtype=[tuple(tt) for tt in col_types.items()]))

This fails with TypeError: Cannot interpret 'typing.Optional[str]' as a data type.

I need a function or method of Optional[X] -> X. Any suggestions other than using repr with regex?


Solution

  • As long as Optional[X] is equivalent to Union[X, None]:

    from typing import Union, get_args, get_origin
    
    def get_optional_arg(typ: type) -> type | None:
        # make sure typ is really Optional[...], otherwise return None
        if get_origin(typ) is Union:
            args = get_args(typ)
            if len(args) == 2 and args[1] is type(None):
                return args[0]
    
    col_types = {
        k: get_optional_arg(f.annotation) or f.annotation
        for k, f in MyModel.model_fields.items()
    }