I am parsing data from an API (https://rapidapi.com/apidojo/api/tasty, to be exact). I have two classes: Tip (represents a review on a recipe) and TipMetadata (represents all of the metadata associated with the Tip). I don't technically have to do this, but I prefer it this way. The problem is, when I get the data from the API, it isn't structured the same way I want it to be in my code.
Here is a simplified example to get my point across:
# tip.py
class Tip(BaseModel):
author: str
body: str
metadata: TipMetadata
class TipMetadata(BaseModel):
posted_at: datetime
updated_at: datetime
id: int
But here is what the data I am getting looks like:
{
"author": "John Smith",
"body": "Nice recipe!",
"posted_at": 1716874324,
"updated_at": 1716738295,
"id": 8011
}
How do I make Pydantic parse posted_at
, updated_at
, and id
into a TipMetadata object instead of ignoring them and complaining that there was no TipMetadata object in the data?
I've tried looking at the documentation and I found things called field_validator
and custom types, but I feel like there is probably a simpler way to implement this and I just don't know what that is.
Edit:
I ended up doing the following:
class TipMetadata(BaseModel):
posted_at: datetime
updated_at: datetime
id: int
class Tip(BaseModel):
author: str
body: str
metadata: TipMetadata
@pydantic.model_validator(mode="before")
@classmethod
def nest_metadata(
cls, values: dict[str, typing.Any]
) -> typing.Dict[str, typing.Any]:
metadata_fields = set(TipMetadata.model_fields)
metadata_data = {
key: values.pop(key) for key in metadata_fields if key in values
}
values["metadata"] = metadata_data
return values
Devil's advocate - given that you're going to have to update the data models every time the API changes anyways, you might as well just define a hardcoded function that does that transformation for you. Assuming you've parsed the API response into a dict:
def response_to_classes(resp: dict) -> Tip:
out = Tip(
author = resp['author'],
body = resp['body'],
metadata = TipMetadata(
posted_at = resp['posted_at'],
updated_at = resp['updated_at'],
id = resp['id']
)
)
return out