Search code examples
python-3.xfastapipydanticstarlettepydantic-v2

Pydantic: How to validate json string that has an inner json string?


I have the following string that my API is receiving:

'{"data": 123, "inner_data": "{\\"color\\": \\"RED\\"}"}'

My goal is to build a pydantic model that can validate the outer and inner data fields. So I built the following models:

from pydantic import BaseModel

class InnerData(BaseModel):
    color: str

class Expected(BaseModel):
    data: int
    inner_data: InnerData

But when I run the following:

incoming_json_string = '{"data": 123, "inner_data": "{\\"color\\": \\"RED\\"}"}'
expected = Expected.model_validate_json(incoming_json_string)

I get:

Traceback (most recent call last):
  File ".../site-packages/pydantic/main.py", line 532, in model_validate_json
    return cls.__pydantic_validator__.validate_json(json_data, strict=strict, context=context)
pydantic_core._pydantic_core.ValidationError: 1 validation error for Expected
inner_data
  Input should be an object [type=model_type, input_value='{"color": "RED"}', input_type=str]
    For further information visit https://errors.pydantic.dev/2.5/v/model_type

The link in the traceback doesn't help because it tells me the data is a string but should be a model. But that's what I'm trying to conjure up when I do inner_data: InnerData. What should I try?


Solution

  • The way you've constructed your models, you can validate a nested JSON object, like this:

    {
      "data": 123,
      "inner_data": {
        "color": "RED"
      }
    }
    

    Pydantic will happily consume that JSON into your Expected and InnerData classes:

    >>> incoming_json_string = '{"data": 123, "inner_data": {"color": "RED"}}'
    >>> expected = Expected.model_validate_json(incoming_json_string)
    >>> expected
    Expected(data=123, inner_data=InnerData(color='RED'))
    

    But if you want inner_data to receive a JSON string rather than an object, you would need to explicitly handle that situation. You could use a BeforeValidator, like this:

    from pydantic import BaseModel, BeforeValidator
    from typing import Annotated
    
    class InnerData(BaseModel):
        color: str
    
    class Expected(BaseModel):
        data: int
        inner_data: Annotated[InnerData, BeforeValidator(InnerData.model_validate_json)]
    
    incoming_json_string = '{"data": 123, "inner_data": "{\\"color\\": \\"RED\\"}"}'
    expected = Expected.model_validate_json(incoming_json_string)
    
    

    Given a JSON object containing a nested JSON string, like this:

    {
      "data": 123,
      "inner_data": "{\"color\": \"RED\"}"
    }
    

    The validator will decode the JSON string so that the unserialized result matches what Pydantic expects for InnerData:

    >>> incoming_json_string = '{"data": 123, "inner_data": "{\\"color\\": \\"RED\\"}"}'
    >>> expected = Expected.model_validate_json(incoming_json_string)
    >>> expected
    Expected(data=123, inner_data=InnerData(color='RED'))
    

    I don't know anything about the problem you're trying to solve, but in most cases you actually want to keep your code the way you've got it in your question and avoid embedded JSON encoded data inside a JSON object.