Search code examples
jquerymongodbperiod

Mongodb Query - cumulative sum


In my database I have activities performed in different periods (start in a day and finish another day). This is an example of a document:

    {
        "activity" : "test",
        "period" : [ ISODate("2020-01-01T00:00:00Z"), ISODate("2020-02-10T00:00:00Z")]
    }

The "period" is an array of 2 elements, the first is the start time and the second is the end.

I would like to write a query which gives me in output the number of activities in the time.

For better understating, considering this 2 activities:

{
    "activity" : "first_activity",
    "period" : [ ISODate("2020-01-01T00:00:00Z"),ISODate("2020-02-10T00:00:00Z")]
},
{
    "activity" : "second_activity",
    "period" : [ ISODate("2020-01-13T00:00:00Z"), ISODate("2020-01-26T00:00:00Z")]
}

My goal is an output which allows me to draw a graph of couples ( date , #activities ), something like this:

{
        "quantity" : 1,
        "date" : ISODate("2020-01-01T00:00:00Z")
},
{
        "quantity" : 2,
        "date" : ISODate("2020-01-13T00:00:00Z")
},
{
        "quantity" : 1,
        "date" : ISODate("2020-01-26T00:00:00Z")
},
{
        "quantity" : 0,
        "date" : ISODate("2020-02-10T00:00:00Z")
}

Unfortunately I didn't reach the final result yet... and probably I'm writing a query not in a "very nice way"... My logic flow is the following:

  1. using unwind for twice each document keeping in memory the arrayindex,
  2. setting the +1/-1 value for later counting in accordance to the arrayindex,
  3. sorting per date all the results
  4. grouping all the documents per day, making the sum of the value (+1 and -1)
  5. MISSING STEP: The sum should be accumulative

So, by writing this query:

[
 { $unwind : {
        path: "$period",
        includeArrayIndex: "arrayIndex"
      }
    },
 {
    $project: { _id:0, date: "$period", val: { $cond: { if : { $lt: ["$arrayIndex",1] } , then: 1 , else: -1 } }} 
 },
 {
     $sort : { date : 1}
 },
 {
     "$group":{
       _id : { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
       delta: { $sum: "$val" }
     }
 }
]

I could reach the result where I have the list of days when the number of activities changed together with the number of how much it changed (delta).

Something like this can be the output:

{ _id : "2020-09-30", delta: 3 }
{ _id : "2020-11-04", delta: -2 }
{ _id : "2020-12-01", delta: 1 }

But my goal is to have a "cumulative sum" so a result like this:

{ _id : "2020-09-30", count: 3 }
{ _id : "2020-11-04", count: 1 }
{ _id : "2020-12-01", count: 2 }

What kind of additional query I should develop?


Solution

  • At the end I solved in a way which sound too much complex... I think that the query is too slow. I had to save 2 array with all the delta values and one with all the date. Then I could play doing the sum until the specific position (thanks to the arrayindex saved on the second $unwind).

    So to the query insert in the question I added the following:

    {
        "$group":{
          "_id":0,
          "time":{ "$push":"$_id" },
          "totals":{ "$push":"$delta" }
        }
      },
      {
        "$unwind":{ "path":"$time", "includeArrayIndex":"index" }
      },
      {
        "$project":{
          "_id":0,
          "time":1,
          "delta":{ "$arrayElemAt":[ "$totals", "$index" ] },
          "total":{ "$sum":{ "$slice":[ "$totals", { "$add":[ "$index", 1 ] } ] } }
     }
     }