Search code examples
mongodbcircular-buffer

Mongo as circular buffer


I'am trying to figure out a way to use Mongo as a circular buffer. Currently using SQL Lite but performance wise does not fit our case. The specifications need to be met are: The collection must empty itself every x seconds. The collection must empty itself when a limit of y documents is met.

Going through Mongo documentation, capped collections along with change events seem a way to go.

https://docs.mongodb.com/manual/core/capped-collections/

https://docs.mongodb.com/manual/reference/change-events/

In the documentation states: "Capped collections work in a way similar to circular buffers"

However I am not sure how to:

  1. Empty the collection every x seconds. Mongo TTL feature is not feasible since TTL isn't supported on capped collections.Other alternatives?
  2. Retrieve any "removed documents". Replace operation type of Change Events seems an aproach.Other alternatives?

Has anyone tried using Mongo as circular buffer? Is the above -Capped Collections/Change Events- the way to achive it?

Thanks for any response.


Solution

  • From https://en.wikipedia.org/wiki/Circular_buffer:

    a circular buffer [...] is a data structure that uses a single, fixed-size buffer as if it were connected end-to-end.

    I'm afraid the "Capped collections work in a way similar to circular buffers" you quoted uses precisely this definition of the circular buffer.

    The capped collection is capped by size and/or number of document. The old documents are not removed by timer but by new documents. Think about it like the new documents overwrite the old ones.

    Unfortunately this feature makes it impossible to delete documents from the collection https://docs.mongodb.com/manual/core/capped-collections/#document-deletion. Neither by TTL nor explicitly. And since there is no formal deletion, there is no deletion event in the change stream.

    To put it simple, if you need to retrieve documents evicted from the buffer you need to implement it yourself.

    TTL index may work for your, but it is time bound, not size bound. It will issue a deletion event to the changestream, but three are few things to consider:

    • you will need to maintain changestream client running to ensure you catch all events.
    • TTL index process comes with the cost. Every minute Mongodb runs the TTL Monitor thread to delete outdated documents. It consumes resources. Not as much as sqlite but still system performance may degrade and documents may not be deleted exactly after specified amount of time if it's busy with some other operations.

    It would be advisable to take control and select/delete documents yourself. I understand you already have some implementation that uses sqlite, so it's just a matter of adjusting it to use mongodb instead.

    db.collection.find({}).sort({_id:-1}).limit(1)
    

    Will return you the oldest document. It uses default index and should perform well.