I need to retrieve a list of records from a mongodb collection that includes an image field, and convert them into Pydantic models.
Is there a corresponding bson.Binary type in Pydantic? Or a way to convert the binary data to a type that can be validated by Pydantic?
I have tried with "bytes":
class Category(BaseModel):
category_name: str = Field(...)
category_icon_binary: Optional[bytes] = Field(...)
but I get:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid utf-8
I have tried with "bson.Binary" and arbitrary_types_allowed:
class Category(BaseModel):
category_name: str = Field(...)
category_icon_binary: Optional[Binary] = Field(...)
model_config = ConfigDict(arbitrary_types_allowed=True,....)
and I get:
Input should be an instance of Binary [type=is_instance_of, input_value=b'\x89PNG\r\n\x1a\n\x00\x...0\x00\x00IEND\xaeB`\x82', input_type=bytes]
Here's the model that includes a bytes field (or Binary):
# Pydantic model
from pydantic import BaseModel, Field, BeforeValidator
from typing import Optional, Annotated, List
from dataclasses import dataclass
PyObjectId = Annotated[str, BeforeValidator(str)]
class Category(BaseModel):
category_id: Optional[PyObjectId] = Field(alias="_id", default=None)
category_name: str = Field(...)
category_icon_binary: Optional[bytes] = Field(...)
content_type: Optional[str] = Field(...)
class CategoriesCollection(BaseModel):
categories: List[Category]
Here's the endpoint using the model:
# API endpoint
@router.get("/", response_model=list[Category], response_model_by_alias=False,)
def get():
# THIS WORKS FINE there is no validation error in loading bytes in the model list[Category]
result = CategoriesCollection(categories=categories_collection.find())
# THIS RAISES validation error UnicodeDecodeError
return result.categories
This shows that before the return there is no validation error and binary is properly loaded:
This shows how document is saved in the collection:
Based on the added code and screenshots, the issue is not with Pydantic, not exactly. As you've pointed out, the Pydantic models for Category
and CategoriesCollection
are being loaded correctly and without error.
The issue comes occurs because you are trying to return Binary data in JSON, which is not allowed - and even when using a response_model
, FastAPI/Pydantic don't have default handlers for this, unlike datetime, UUID, etc.
You'll need to use Base64 encoding, or Base85, etc. to convert that to a string which is JSON-acceptable, with @field_serializer
:
import base64
class Category(BaseModel):
category_id: Optional[PyObjectId] = Field(alias="_id", default=None)
category_name: str = Field(...)
category_icon_binary: Optional[bytes] = Field(...)
content_type: Optional[str] = Field(...)
@field_serializer('category_icon_binary', when_used='json-unless-none')
def bytes_as_base64(b: bytes):
return base64.b64encode(b)
Note: The response handler may still have an issue due to bytes vs str. So update the model to have
category_icon_binary: Optional[Union[bytes, str]] = Field(...)
Alternatively, you can create a class like OutputCategory
which inherits from Category
and overrides category_icon_binary
and converts it to Base64, or hex or whatever you choose. Like in the FastAPI docs example for In/OutUser
class OutputCategory(Category): # inherit from Category
category_icon_binary: Optional[str] = Field(...) # note the change in type
# and add a validator which will convert the bin to hex/base64
...