I have a simple REST api which is a books store created with FastAPI and mongo db as the backend (I have used Motor
as the library instead of Pymongo
). I have a GET
endpoint to get all the books in the database which also supports query strings (For example : A user can search for books with a single author or with a genre type etc).
Below are the corresponding codes for this endpoint :
routers.py
@router.get("/books", response_model=List[models.AllBooksResponse])
async def get_the_list_of_all_books(
authors: Optional[str] = None,
genres: Optional[str] = None,
published_year: Optional[str] = None,
) -> List[Dict[str, Any]]:
if authors is None and genres is None and published_year is None:
all_books = [book for book in await mongo.BACKEND.get_all_books()]
else:
all_books = [
book
for book in await mongo.BACKEND.get_all_books(
authors=authors.strip('"').split(",") if authors is not None else None,
genres=genres.strip('"').split(",") if genres is not None else None,
published_year=datetime.strptime(published_year, "%Y")
if published_year is not None
else None,
)
]
return all_books
The corresponding model :
class AllBooksResponse(BaseModel):
name: str
author: str
link: Optional[str] = None
def __init__(self, name, author, **data):
super().__init__(
name=name, author=author, link=f"{base_uri()}book/{data['book_id']}"
)
And the backend function for getting the data:
class MongoBackend:
def __init__(self, uri: str) -> None:
self._client = motor.motor_asyncio.AsyncIOMotorClient(uri)
async def get_all_books(
self,
authors: Optional[List[str]] = None,
genres: Optional[List[str]] = None,
published_year: Optional[datetime] = None,
) -> List[Dict[str, Any]]:
find_condition = {}
if authors is not None:
find_condition["author"] = {"$in": authors}
if genres is not None:
find_condition["genres"] = {"$in": genres}
if published_year is not None:
find_condition["published_year"] = published_year
cursor = self._client[DB][BOOKS_COLLECTION].find(find_condition, {"_id": 0})
return [doc async for doc in cursor]
Now i want to implement pagination for this endpoint . Here i have a few questions :
limit
and skip
) would work for me, because i want to also make it work when i am using other filter parameters (for example for author and genre) and there is no way i can know the ObjectId's unless i make the first query to get the data and then i want to do pagination.But the issue is everywhere i see using limit
and skip
is discouraged.
Can someone please let me know what are the best practices here and can something apply to my requirement and use case?
Many thanks in advance.
There is no right or wrong answer to such a question. A lot depends on the technology stack that you use, as well as the context that you have, considering also the future directions of both the software you wrote as well as the software you use (mongo).
Answering your questions:
It depends on the load you have to manage and the dev stack you use. Usually it is done at database level, since retrieving the first 110 and dropping the first 100 is quite dumb and resource consuming (the database will do it for you).
To me is seems pretty simple on how to do it via fastapi
: just add to your get
function the parameters limit: int = 10
and skip: int = 0
and use them in the filtering function of your database. Fastapi
will check the data types for you, while you could check that limit is not negative or above, say, 100.
It says that there is no silver bullet and that since skip
function of mongo does not perform well. Thus he believes that the second option is better, just for performances. If you have billions and billions of documents (e.g. amazon), well, it may be the case to use something different, though by the time your website has grown that much, I guess you'll have the money to pay an entire team of experts to sort things out and possibly develop your own database.
Concluding, the limit
and skip
approach is the most common one. It is usually done at the database level, in order to reduce the amount of work of the application and bandwidth.
Mongo is not very efficient in skipping and limiting results. If your database has, say a million of documents, then I don't think you'll even notice. You could even use a relational database for such a workload. You can always benchmark the options you have and choose the most appropriate one.
I don't know much about mongo, but I know that generally, indexes can help limiting and skipping records (docs in this case), but I'm not sure if it's the case for mongo as well.