Search code examples
databasemongodbcollectionsmaxaggregation

Getting all data with highest value in MongoDB


I have a collection called "Vulnerabilities" in MongoDB 5. It looks like this:

[
  { _id: "...", "project": 244, "scanner": "sonarqube", "version": 1 },
  { _id: "...", "project": 244, "scanner": "sonarqube", "version": 2 },
  { _id: "...", "project": 244, "scanner": "sonarqube", "version": 2 },
  { _id: "...", "project": 244, "scanner": "shellcheck", "version": 1 },
  { _id: "...", "project": 244, "scanner": "shellcheck", "version": 2 },
  { _id: "...", "project": 244, "scanner": "shellcheck", "version": 3 },
  { _id: "...", "project": 244, "scanner": "powershell", "version": 2 },
  { _id: "...", "project": 244, "scanner": "powershell", "version": 3 },
  { _id: "...", "project": 244, "scanner": "powershell", "version": 4 },
  { _id: "...", "project": 244, "scanner": "powershell", "version": 4 }
]

I would like to retrieve, for each scanner, the documents with the highest version. It should return:

[
  { _id: "...", "project": 244, "scanner": "sonarqube", "version": 2 },
  { _id: "...", "project": 244, "scanner": "sonarqube", "version": 2 },
  { _id: "...", "project": 244, "scanner": "shellcheck", "version": 3 },
  { _id: "...", "project": 244, "scanner": "powershell", "version": 4 },
  { _id: "...", "project": 244, "scanner": "powershell", "version": 4 }
]

I would like to do it with only one query.

Any suggestions?

Many thanks


Solution

  •   db.collection.aggregate([
       {
         $sort: {
               version: -1
               }
      },
      {
       $group: {
      _id: "$scanner",
      version: {
        $first: "$version"
      },
      test: {
        $push: {
          v: "$version",
          id: "$_id",
          p: "$project"
          }
         }
        }
      },
      {
       $project: {
         items: {
          $filter: {
            input: "$test",
            as: "item",
            cond: {
            $eq: [
              "$$item.v",
              "$version"
            ]
           }
          }
         }
       }
      },
       {
        $unwind: "$items"
       },
       {
        $project: {
         scanner: "$_id",
         _id: "$items.id",
         project: "$items.p",
         version: "$items.v"
        }
       }
      ])
    

    Explained:

    1. Order(Sort) descending by version
    2. group by scanner taking the first value from the list per version and pushing all values in test array so we have suitable for filter in next stage
    3. Filter from test array only the elements with the max version we need
    4. Unwind the test array where only max values are filtered
    5. In the final $project stage rename the fields as per original names you need

    playground