Search code examples
mongodbbinaryfastapipydantic

Pydantic custom type for MongoDB Binary field


I need to retrieve a list of records from a mongodb collection that includes an image field, and convert them into Pydantic models.

Is there a corresponding bson.Binary type in Pydantic? Or a way to convert the binary data to a type that can be validated by Pydantic?

I have tried with "bytes":

class Category(BaseModel):
    category_name: str = Field(...)
    category_icon_binary: Optional[bytes] = Field(...)

but I get:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid utf-8

I have tried with "bson.Binary" and arbitrary_types_allowed:

class Category(BaseModel):
    category_name: str = Field(...)
    category_icon_binary: Optional[Binary] = Field(...)
    model_config = ConfigDict(arbitrary_types_allowed=True,....)

and I get:

  Input should be an instance of Binary [type=is_instance_of, input_value=b'\x89PNG\r\n\x1a\n\x00\x...0\x00\x00IEND\xaeB`\x82', input_type=bytes]

Here's the model that includes a bytes field (or Binary):

# Pydantic model

from pydantic import BaseModel, Field, BeforeValidator
from typing import Optional, Annotated, List
from dataclasses import dataclass

PyObjectId = Annotated[str, BeforeValidator(str)]

class Category(BaseModel):
    category_id: Optional[PyObjectId] = Field(alias="_id", default=None)
    category_name: str = Field(...)
    category_icon_binary: Optional[bytes] = Field(...)
    content_type: Optional[str] = Field(...)

class CategoriesCollection(BaseModel):
    categories: List[Category]

Here's the endpoint using the model:

# API endpoint

@router.get("/", response_model=list[Category], response_model_by_alias=False,)
def get():
# THIS WORKS FINE there is no validation error in loading bytes in the model list[Category]
    result = CategoriesCollection(categories=categories_collection.find())

# THIS RAISES validation error UnicodeDecodeError
    return result.categories

This shows that before the return there is no validation error and binary is properly loaded:

enter image description here

This shows how document is saved in the collection:

enter image description here


Solution

  • Based on the added code and screenshots, the issue is not with Pydantic, not exactly. As you've pointed out, the Pydantic models for Category and CategoriesCollection are being loaded correctly and without error.

    The issue comes occurs because you are trying to return Binary data in JSON, which is not allowed - and even when using a response_model, FastAPI/Pydantic don't have default handlers for this, unlike datetime, UUID, etc.

    You'll need to use Base64 encoding, or Base85, etc. to convert that to a string which is JSON-acceptable, with @field_serializer:

    import base64
    
    class Category(BaseModel):
        category_id: Optional[PyObjectId] = Field(alias="_id", default=None)
        category_name: str = Field(...)
        category_icon_binary: Optional[bytes] = Field(...)
        content_type: Optional[str] = Field(...)
        
        @field_serializer('category_icon_binary', when_used='json-unless-none')
        def bytes_as_base64(b: bytes):
            return base64.b64encode(b)
    

    Note: The response handler may still have an issue due to bytes vs str. So update the model to have

    category_icon_binary: Optional[Union[bytes, str]] = Field(...)
    

    Alternatively, you can create a class like OutputCategory which inherits from Category and overrides category_icon_binary and converts it to Base64, or hex or whatever you choose. Like in the FastAPI docs example for In/OutUser

    class OutputCategory(Category):  # inherit from Category
        category_icon_binary: Optional[str] = Field(...)  # note the change in type
        
       # and add a validator which will convert the bin to hex/base64
       ...