Search code examples
azure-cosmosdbgreatest-n-per-groupdocument-database

Get the latest document based on a document's property in Azure Cosmos DB


Let's say that I have a Cosmos DB collection with the SQL API that contains a list of messages that people have sent that give their current mood and a timestamp of when the message has been received. People can send messages whenever they want.

In my collection I have something that looks like this:

[
    {
      "PersonName": "John",
      "CurrentMood": "Moody",
      "TimeStamp": "2012-04-23T18:25:43.511Z",
      "id": "25123829-1745-0a09-5436-9cf8bdcc95e3"
    },
    {
      "PersonName": "Jim",
      "CurrentMood": "Happy",
      "TimeStamp": "2012-05-23T17:25:43.511Z",
      "id": "6feb7b41-4b85-164e-dcd4-4e078872c5e2"
    },
    {
      "PersonName": "John",
      "CurrentMood": "Moody",
      "TimeStamp": "2012-05-23T18:25:43.511Z",
      "id": "b021a4a5-ee92-282c-0fe0-b5d6c27019af"
    },
    {
      "PersonName": "Don",
      "CurrentMood": "Sad",
      "TimeStamp": "2012-03-23T18:25:43.511Z",
      "id": "ee72cb36-4304-06e5-ed7c-1d0ff890de48"
    }
]

I would like to send a query that gets the "current" mood of all the user who sent a message (the latest message received for all person).

It's relatively easy to do for each particular user, by combining TOP 1 and ORDER BY

SELECT TOP 1 *
FROM C
WHERE C.PersonName = "John"
ORDER BY C.TimeStamp

But I feel like looping through all my users and running the query for each might be very wasteful in resource and become expensive quickly, but I can't find a way that will work.

To note that I will quickly have a lot of persons who will send a lot of messages.


Solution

  • The common pattern for this is to have two collections, one that stores documents for (user, timestamp -> mood), then a downstream processor using Azure Functions or the Change feed API directly that computes the (user -> latest mood)

    [Mood Time series Collection] ==> Lambda ==> [Latest Mood Collection]

    And the Latest Mood Collection will look something like this for the data stream above. You then use this for your lookups (which are now key lookups).

    {
      "PersonName": "Jim",
      "LatestMood": "Happy",
      "LatestTimeStamp": "2012-05-23T17:25:43.511Z",
      "id": "6feb7b41-4b85-164e-dcd4-4e078872c5e2"
    },
    {
      "PersonName": "John",
      "LatestMood": "Moody",
      "LatestTimeStamp": "2012-05-23T18:25:43.511Z",
      "id": "b021a4a5-ee92-282c-0fe0-b5d6c27019af"
    },
    {
      "PersonName": "Don",
      "LatestMood": "Sad",
      "LatestTimeStamp": "2012-03-23T18:25:43.511Z",
      "id": "ee72cb36-4304-06e5-ed7c-1d0ff890de48"
    }