Search code examples
mongodbdictionarycountlet

Mongo Aggregation: Use $count of found documents for query


I have a Mongo collection series where each document has a list with dataPoints. All series with the same testStepId contain the same amount of dataPoints:

{
  "seriesId": {
    "seriesId": "77678ca1-31db-4cec-a042-68a3053b92c6"
  },
  "testStepId": {
    "testStepId": "c152415b-2392-4c2b-af74-51a4973bd257"
  },
  "measurement": {
    "startTime": {
      "$date": "2020-07-07T12:40:49.782Z"
    },
    "endTime": {
      "$date": "2020-07-07T12:42:19.782Z"
    }
  },
  "dataPoints": [
    {
      "timeStamp": {
        "$date": "2020-07-07T12:41:09.782Z"
      },
      "value": "Value_1_1"
    },
    {
      "timeStamp": {
        "$date": "2020-07-07T12:41:29.782Z"
      },
      "value": "Value_1_2"
    },
    {
      "timeStamp": {
        "$date": "2020-07-07T12:41:39.782Z"
      },
      "value": "Value_1_3"
    },
    ...
    {
      "timeStamp": {
        "$date": "2020-07-07T12:42:19.782Z"
      },
      "value": "Value_2_11"
    }
  ]
}

Now I want to query all series documents that match a specific testStepId (no problem). But instead of loading all dataPoints of all found series I want to load in sum only 1000 dataPoints. So in case of 10 found series I have to load only 100 dataPoints per series:

-> Load every (dataPoints.size() / 100)-th dataPoint

-> This means I have to consider the count of found series documents and the count of dataPoints in the series

-> Load every X-th dataPoint where

X = 1000 / <count of documents> / <count of dataPoints>

I'm trying hard to get this done via aggregation with the MongoDB Compass. But I'm still failing with counting the found documents and unsing this value ...

To start simple I just try to get every 2nd dataPoint:

{
    project: {
        dataPoints: {
            $map: {
                input: { $range: [ 0, {"$size": "$dataPoints"}, 2 ] },
                as: "index",
                in: { $arrayElemAt: [ "$dataPoints", "$$index" ] }
            }
        }
    }
}

-> Works fine

Now I want to get every x-th 'dataPoint' dependend on the count of found documents. For this I tried some different ways, none of them works ...

  1. try: Use $count instead of fixed number:
{
    project: {
        dataPoints: {
            $map: {
                input: { $range: [ 0, {"$size": "$dataPoints"}, $count ] },
                as: "index",
                in: { $arrayElemAt: [ "$dataPoints", "$$index" ] }
            }
        }
    }
}

-> "Project specification must be an object"

  1. try: Define the count as variable:
{
    project: {
        dataPoints: {
            $let: {
                vars: { 
                    total: "$count",
                },
                in: { 
                    $map: {
                        input: { $range: [ 0, {"$size": "$dataPoints"}, "$$total"] },
                        as: "index",
                        in: { $arrayElemAt: [ "$dataPoints", "$$index" ] }
                    }
                }
            }
        }
    }   
}

-> "$range requires a numeric value step, found value of type:missing"

Obviously my approach is wrong. Could any body give me some hint how to get this working?


Solution

  • I think the formula for X is X = <count of dataPoints> * <count of documents> / 1000

    You can not directly access the number of documents (count) at a particular aggregation pipeline stage. However, you can combine all documents into a single document and count them, then expand them back into separate documents. You can achieve this by using $group or $facet.

    I'll show an example with $group

    [
      {
        $group: {
          _id: null,
          count: { $sum: 1 },
          all: { $push: "$$ROOT" }
        }
      },
      {
        $unwind: "$all"
      },
      {
        $replaceWith: { // $replaceWith is available from v4.2, for earlier version use { $replaceRoot: { newRoot: <doc> } }
          $mergeObjects: [
            "$all",
            {
              dataPoints: {
                $map: {
                  input: {
                    $range: [
                      0,
                      { $size: "$all.dataPoints" },
                      {
                        $ceil: {
                          $divide: [
                            {
                              $multiply: [
                                { "$size": "$all.dataPoints" },
                                "$count"
                              ]
                            },
                            1000
                          ]
                        }
                      }
                    ]
                  },
                  as: "index",
                  in: { $arrayElemAt: ["$all.dataPoints", "$$index"] }
                }
              }
            }
          ]
        }
      }
    ]
    

    Mongo Playground