It is kind of difficult to accurately phrase my question in one sentence.
I have the following models:
from pydantic import BaseModel
class Detail1(BaseModel):
round: bool
volume: float
class AppleData1(BaseModel):
origin: str
detail: Detail1
class Detail2(BaseModel):
round: bool
weight: float
class AppleData2(BaseModel):
origin: str
detail: Detail2
Here AppleData1
has an attribute detail
which is of the type Detail1
. AppleData2
has an attribute detail
which is of the type Detail2
. I want to make an Apple
class which contains all the attributes of AppleData1
and AppleData2
.
Do you have a generic approach to implement this algorithm:
Whenever AppleData1
and AppleData2
have an attribute of the same name:
If they are of the same type, use one of them. For example, AppleData1.origin
and AppleData2.origin
are both of the type str
. So Apple.origin
is also of type str
.
If they are of different types, merge them. For example, AppleData1.detail
and AppleData2.detail
, they are of type Detail1
and Detail2
respectively. So Apple.detail
should contain all the inner attributes.
Any common inner attribute is always for the same physical quantity. So overwriting is allowed. For example, Detail1.round
and Detail2.round
are both of type bool
. So the resulting Apple.detail.round
is also of type bool
.
The end results should be equivalent to the Apple
model below. (The definition of Detail
class below is only used to make the code below complete. The generic approach should not hard-code the Detail
class.)
class Detail(BaseModel):
round: bool
volume: float
weight: float
class Apple(BaseModel):
origin: str
detail: Detail
class Detail(Detail1, Detail2):
pass
class Apple(AppleData1, AppleData2):
origin: str
detail: Detail
print(Apple.schema_json())
This solution works but it is too-specific.
Here I need to pin-point that detail
attribute from AppleData1
and AppleData2
, and specifically create the Detail
class from specifically Detail1
and Detail2
.
I need to pin-point that origin
is a common attribute of the same type (str
). So I specifically hard-coded origin: str
in the definition of the Apple
class.
Implementing a custom recursive version of the create_model
function to dynamically construct a "combined" model class should work:
from typing import TypeGuard, TypeVar
from pydantic import BaseModel, create_model
from pydantic.fields import SHAPE_SINGLETON
M = TypeVar("M", bound=BaseModel)
def is_pydantic_model(obj: object) -> TypeGuard[type[BaseModel]]:
return isinstance(obj, type) and issubclass(obj, BaseModel)
def create_combined_model(
__name__: str,
/,
model1: type[M],
model2: type[M],
) -> type[M]:
field_overrides = {}
for name, field1 in model1.__fields__.items():
field2 = model2.__fields__.get(name)
if field2 is None:
continue
if is_pydantic_model(field1.type_):
assert field1.shape == SHAPE_SINGLETON, "No model collections allowed"
assert is_pydantic_model(field2.type_), f"{name} with different types"
sub_model = create_combined_model(
f"Combined{field1.type_.__name__}{field2.type_.__name__}",
field1.type_,
field2.type_,
)
field_overrides[name] = (sub_model, field1.field_info)
else:
assert field1.annotation == field2.annotation, f"Different types"
return create_model(__name__, __base__=(model1, model2), **field_overrides) # type: ignore
This incorporates your restrictions/assumptions about the models that can be combined that you elaborated on in your comments.
It does not support combining fields that are annotated with C[M]
, where C
is any generic collection type and M
is a subclass of BaseModel
. That is what the SHAPE_SINGLETON
check assures. It would possible to incorporate logic that allows combining models and retaining the shape of the field (e.g. list[Detail1]
and list[Detail2]
), but I left that out because you did not ask for that explicitly and it is a bit more complicated.
from pydantic import BaseModel
class AppleBase(BaseModel):
foo: str
class DetailBase(BaseModel):
round: bool
class Detail1(DetailBase):
volume: float
class AppleData1(AppleBase):
bar: int
detail: Detail1
class Detail2(DetailBase):
weight: float
class AppleData2(AppleBase):
baz: float
detail: Detail2
Apple = create_combined_model("Apple", AppleData1, AppleData2)
print(Apple.schema_json(indent=4))
{
"title": "Apple",
"type": "object",
"properties": {
"foo": {
"title": "Foo",
"type": "string"
},
"baz": {
"title": "Baz",
"type": "number"
},
"detail": {
"$ref": "#/definitions/CombinedDetail1Detail2"
},
"bar": {
"title": "Bar",
"type": "integer"
}
},
"required": [
"foo",
"baz",
"detail",
"bar"
],
"definitions": {
"CombinedDetail1Detail2": {
"title": "CombinedDetail1Detail2",
"type": "object",
"properties": {
"round": {
"title": "Round",
"type": "boolean"
},
"weight": {
"title": "Weight",
"type": "number"
},
"volume": {
"title": "Volume",
"type": "number"
}
},
"required": [
"round",
"weight",
"volume"
]
}
}
}
An obvious drawback to this solution is that because it dynamically creates the model class, it is impossible to properly convey the type of the resulting model in terms of static analysis.
The way I wrote it now, the function is generic to the greatest extent possible in that the returned type will be inferred as either the joined or the union type, depending on the static type checker, of the two input models model1
and model2
.
In the demo example this means some type checkers like Mypy for example will infer the type of Apple
to be AppleBase
(join). This is of course not wrong, but it is not as specific as we might like because it fails to account for the existence of the bar
, baz
, and detail
attributes.
A type checker that uses unions instead might infer the type as AppleData1 | AppleData2
instead. (I have not tested it, but I believe Pyright does this.) This may or may not be preferable, because it would at least always cover the existence of a detail
attribute (albeit with yet another union type of Detail1 | Detail2
), but it would be ambiguous whether or not Apple
has a bar
or a baz
attribute to such a type checker.
The ideal solution would be to define the return type as the intersection of the two model types passed into it. But unfortunately we do not have that typing construct (yet).
All of this has no effect on the runtime behavior of the constructed class of course, but it is not ideal for IDE auto-suggestions for example.
Consequently, your initial explicit approach of using multiple inheritance for all the models involved is still something I would recommend, unless your models become very large/complex and numerous.