I have the following string that my API is receiving:
'{"data": 123, "inner_data": "{\\"color\\": \\"RED\\"}"}'
My goal is to build a pydantic
model that can validate the outer and inner data fields.
So I built the following models:
from pydantic import BaseModel
class InnerData(BaseModel):
color: str
class Expected(BaseModel):
data: int
inner_data: InnerData
But when I run the following:
incoming_json_string = '{"data": 123, "inner_data": "{\\"color\\": \\"RED\\"}"}'
expected = Expected.model_validate_json(incoming_json_string)
I get:
Traceback (most recent call last):
File ".../site-packages/pydantic/main.py", line 532, in model_validate_json
return cls.__pydantic_validator__.validate_json(json_data, strict=strict, context=context)
pydantic_core._pydantic_core.ValidationError: 1 validation error for Expected
inner_data
Input should be an object [type=model_type, input_value='{"color": "RED"}', input_type=str]
For further information visit https://errors.pydantic.dev/2.5/v/model_type
The link in the traceback doesn't help because it tells me the data is a string but should be a model. But that's what I'm trying to conjure up when I do inner_data: InnerData
. What should I try?
The way you've constructed your models, you can validate a nested JSON object, like this:
{
"data": 123,
"inner_data": {
"color": "RED"
}
}
Pydantic will happily consume that JSON into your Expected
and InnerData
classes:
>>> incoming_json_string = '{"data": 123, "inner_data": {"color": "RED"}}'
>>> expected = Expected.model_validate_json(incoming_json_string)
>>> expected
Expected(data=123, inner_data=InnerData(color='RED'))
But if you want inner_data
to receive a JSON string rather than an object, you would need to explicitly handle that situation. You could use a BeforeValidator
, like this:
from pydantic import BaseModel, BeforeValidator
from typing import Annotated
class InnerData(BaseModel):
color: str
class Expected(BaseModel):
data: int
inner_data: Annotated[InnerData, BeforeValidator(InnerData.model_validate_json)]
incoming_json_string = '{"data": 123, "inner_data": "{\\"color\\": \\"RED\\"}"}'
expected = Expected.model_validate_json(incoming_json_string)
Given a JSON object containing a nested JSON string, like this:
{
"data": 123,
"inner_data": "{\"color\": \"RED\"}"
}
The validator will decode the JSON string so that the unserialized result matches what Pydantic expects for InnerData
:
>>> incoming_json_string = '{"data": 123, "inner_data": "{\\"color\\": \\"RED\\"}"}'
>>> expected = Expected.model_validate_json(incoming_json_string)
>>> expected
Expected(data=123, inner_data=InnerData(color='RED'))
I don't know anything about the problem you're trying to solve, but in most cases you actually want to keep your code the way you've got it in your question and avoid embedded JSON encoded data inside a JSON object.