Search code examples
couchdbcloudant

How to get list of documents which are deduplicated (on timestamp - latest wins) if the document has a specific key?


I want to use the write-only (immutable) document strategy in cloudant partitioned db (described here), wherein for every document update, I create a new document with updated fields. For example: the db consists of orders (with fields order_id, details, time_of_doc_creation). Say a certain order's detail is required to be updated - I create a new document with updated data and preserving the order_id. Now at some point of time, I want to see all the orders in the system and in case there are documents whose order_id are same then only the one with latest date should end up in the list.

The only way I can think of doing this is pull all the documents from the db and then on the client side do the filtering by myself, but is there something that I can use in the db itself to get such results?

List of orders in db:

{
order_id:1,
details:"old",
time:"1"
},
{
order_id:2,
details:"old",
time:"2"
},
{
order_id:1,
details:"new",
time:"3"
},
{
order_id:2,
details:"new",
time:"4"
}

Desired output: Give me all the order documents in the system, so db output should look like->

{
order_id:1,
details:"new",
time:"3"
},
{
order_id:2,
details:"new",
time:"4"
}

Solution

  • Using an immutable model is often playing to Cloudant's strengths, but it's not always practical or even possible. So the first question to ask is perhaps how frequently updates would happen to documents if you chose a mutable model. If updates to an order is happening say a few times a second max, go with the mutable idea and check for (likely infrequent) conflicts.

    An immutable model works best as an alternative to updates that otherwise would happen in a few large documents that contains lists or objects -- like a set of rapidly ingested time-series events etc. The other way is to store a set of deltas, and grab the full set of of these deltas from a view and stitch back the order client-side.

    From what I understand from your example, it sounds like you are storing what amounts to full, new revisions of orders as new documents, rather than deltas, essentially bypassing Cloudant's own revision system. This isn't likely ideal, but you can get some way there maybe. With your example documents:

    function (doc) {
      emit([doc.order_id, doc.time], null);
    }
    

    To pick out the latest version of order "1", you can issue a query like:

    curl -s -g 'https://skruger.cloudant.com/demo2/_design/queries/_view/orders-by-time?startkey=[2]&endkey=[1]&include_docs=true&reduce=false&descending=true&limit=1'
    

    which gives

    {
      "total_rows": 4,
      "offset": 2,
      "rows": [
        {
          "id": "bfd5b38c482b04e45d35b6147adcc82a",
          "key": [
            1,
            3
          ],
          "value": null,
          "doc": {
            "_id": "bfd5b38c482b04e45d35b6147adcc82a",
            "_rev": "1-075a88a11bc842fa0def69556c81ab01",
            "order_id": 1,
            "details": "new",
            "time": 3
          }
        }
      ]
    }
    

    Note the reversed start and end keys. The database above is open for reading if you want to poke at it.