Search code examples
pythonfastapipython-typingpydantic

Validations showing invalid details in response for 422 Unprocessable entity (FastAPI , Pydantic )


class Foo(BaseModel):
   template : str
   body: FooBody

class Bar(BaseModel):
   template : str
   body: BarBody

class Xyz(BaseModel):
   template : str
   body: XyzBody

@router.post("/something", status_code=status.HTTP_200_OK)
async def get_pdf(
    request: Request,
    request_body: Union[Foo, Bar, Xyz],
):

In the above code snippet my body can be of three types (any one) using Union

The code works perfect for the given body types. However if single field is missing the 422 validation error provides lot of missing fields even if only one field is missing.

What could be the cause of this. or I am I using Union incorrectly ?

My Goal is to only allow the mentioned BaseModel (Foo, Bar, Xyz) and if my request has detect Foo and certain field missing in the request then it should only show that filed instead of showing all the field in Bar, Xyz and the one missing in Foo

Minimum Reproducible Example

from typing import Union

from fastapi import FastAPI

app = FastAPI(debug=True)

from fastapi import APIRouter, status
from pydantic import BaseModel


class FooBody(BaseModel):
    foo1: str
    foo2: int
    foo3: str

class Foo(BaseModel):
    temp: str
    body: FooBody

class BarBody(BaseModel):
    bar1: str
    bar2: int
    bar3: str

class Bar(BaseModel):
    temp: str
    body: BarBody

class XyzBody(BaseModel):
    xyz1: str
    xyz2: int
    xyz3: str

class Xyz(BaseModel):
    temp: str
    body: XyzBody

@app.get("/type", status_code=status.HTTP_200_OK)
def health(response_body: Union[Foo, Bar, Xyz]):
    return response_body

so if I use

{
    "temp": "xyz",
    "body": {
        "foo1": "ok",
        "foo2": 1,
        "foo3": "2"
    }
}

It works as expected, but if I miss one parameter say foo3 in request body I don't get the validation error saying foo3 is missing instead I get

{
    "detail": [
        {
            "loc": [
                "body",
                "body",
                "foo3"
            ],
            "msg": "field required",
            "type": "value_error.missing"
        },
        {
            "loc": [
                "body",
                "body",
                "bar1"
            ],
            "msg": "field required",
            "type": "value_error.missing"
        },
        {
            "loc": [
                "body",
                "body",
                "bar2"
            ],
            "msg": "field required",
            "type": "value_error.missing"
        },
        {
            "loc": [
                "body",
                "body",
                "bar3"
            ],
            "msg": "field required",
            "type": "value_error.missing"
        },
        {
            "loc": [
                "body",
                "body",
                "xyz1"
            ],
            "msg": "field required",
            "type": "value_error.missing"
        },
        {
            "loc": [
                "body",
                "body",
                "xyz2"
            ],
            "msg": "field required",
            "type": "value_error.missing"
        },
        {
            "loc": [
                "body",
                "body",
                "xyz3"
            ],
            "msg": "field required",
            "type": "value_error.missing"
        }
    ]
}

The entire class parameters mentioned in the Union.

Iam I using Union Wrong ?

What I neeed is like it should accept body of only of classes which I add I it detects its class Foo then it should only check for validations in the class Foo and not the entire thing.


Solution

  • I will try to rephrase and condense your question because it contains a lot of code that is entirely unrelated to the actual underlying problem of validation that you came across.


    MRE

    Here is what the problem actually boils down to:

    from pydantic import BaseModel, ValidationError
    
    
    class Foo(BaseModel):
        foo1: str
        foo2: int
    
    
    class Bar(BaseModel):
        bar1: bool
        bar2: bytes
    
    
    class Model(BaseModel):
        data: Foo | Bar
    
    
    def test(model: type[BaseModel], data: dict[str, object]) -> None:
        try:
            instance = model.parse_obj({"data": data})
        except ValidationError as error:
            print(error.json(indent=4))
        else:
            print(instance.json(indent=4))
    
    
    if __name__ == "__main__":
        incomplete_test_data = {"foo1": "a"}
        valid_test_data = incomplete_test_data | {"foo2": 1}
        test(Model, valid_test_data)
        test(Model, incomplete_test_data)
    

    The output of the first test call is as expected:

    {
        "data": {
            "foo1": "a",
            "foo2": 1
        }
    }
    

    But the second one gives us the following:

    [
        {
            "loc": [
                "data",
                "foo2"
            ],
            "msg": "field required",
            "type": "value_error.missing"
        },
        {
            "loc": [
                "data",
                "bar1"
            ],
            "msg": "field required",
            "type": "value_error.missing"
        },
        {
            "loc": [
                "data",
                "bar2"
            ],
            "msg": "field required",
            "type": "value_error.missing"
        }
    ]
    

    This is not what we want. We want the validation error caused by the second call to recognize that validation should be done via the Foo model and only foo2 is missing, so that it contains only one actual error:

    [
        {
            "loc": [
                "data",
                "foo2"
            ],
            "msg": "field required",
            "type": "value_error.missing"
        }
    ]
    

    How can this be accomplished?


    Answer

    This is exactly what discriminated unions are for. They are also part of the OpenAPI specifiation. However, as the documentations show, a discriminated union requires a discriminator field to be added to each of the models in that union.

    Here is how this could look:

    from typing import Literal
    from pydantic import BaseModel, Field, ValidationError
    
    
    ...
    
    
    class FooDisc(BaseModel):
        data_type: Literal["foo"]
        foo1: str
        foo2: int
    
    
    class BarDisc(BaseModel):
        data_type: Literal["bar"]
        bar1: bool
        bar2: bytes
    
    
    class ModelDisc(BaseModel):
        data: FooDisc | BarDisc = Field(..., discriminator="data_type")
    
    
    if __name__ == "__main__":
        ...
        incomplete_test_data = {
            "data_type": "foo",
            "foo1": "a",
        }
        valid_test_data = incomplete_test_data | {"foo2": 1}
        test(ModelDisc, valid_test_data)
        test(ModelDisc, incomplete_test_data)
    

    Now the output of the first test call is this:

    {
        "data": {
            "data_type": "foo",
            "foo1": "a",
            "foo2": 1
        }
    }
    

    And the second call gives just the following:

    [
        {
            "loc": [
                "data",
                "FooDisc",
                "foo2"
            ],
            "msg": "field required",
            "type": "value_error.missing"
        }
    ]
    

    As the linked Pydantic docs show, more models than two and more complex/nested constructs using discriminated unions are possible as well.

    While the added field may seem annoying, you need to realize that this is the only generally reliable way to convey, which model/schema to use. If you want to get fancy with your specific situation and not use discriminators, you can always write your own validator with pre=True on the model containing the (regular) union and try to parse the data for that field inside that validator based on (for example) keys that you find in the dictionary passed there. But I would advise against this because it introduces a lot of room for errors. Discriminated unions have been introduced for a reason and this problem is exactly that reason.