Search code examples
mongodbmongodb-queryaggregation-framework

MongoDB aggregation Query;: find documents having count more than 1


Scenario

Find documents and return specific fields if there are uniquekey field that has more than 1 occurrences in the collection. in the sample data uniqueKey 100 and 800 are having more than 1 occurrences.

Sample data and query can be found here:

https://mongoplayground.net/p/FZ15zpXffUh

JSON documents:

[
  {
    "cust_id": 1,
    "cust_name": "cust_1",
    "uniqueKey": 100
  },
  {
    "cust_id": 2,
    "cust_name": "cust_2",
    "uniqueKey": 400
  },
  {
    "cust_id": 3,
    "cust_name": "cust_3",
    "uniqueKey": 100
  },
  {
    "cust_id": 4,
    "cust_name": "cust_4",
    "uniqueKey": 800
  },
  {
    "cust_id": 5,
    "cust_name": "cust_5",
    "uniqueKey": 800
  }
]

Current attempt:

db.collection.aggregate([
  {
    "$group": {
      "_id": "$uniqueKey",
      "count": {
        "$sum": 1
      }
    }
  },
  {
    "$match": {
      count: {
        "$gt": 1
      }
    }
  },
  {
    "$project": {
      "count": 1,
      "cust_name": 1
    }
  }
])

This returns below output and count match works correctly but not projection

{
    "_id": 100,
    "count": 2
},
{
    "_id": 800,
    "count": 2
}

Expected output:

{ 
    cust_name: cust_1,
    uniqueKey: 100
}
{
    cust_name: cust_3,
    uniqueKey: 100
}
{  
    cust_name: cust_4,
    uniqueKey: 800
}
{  
    cust_name: cust_5,
    uniqueKey: 800
}

Any help appreciated for the correct aggregation query to get the expected output. I tried various aggregation samples yet to arrive at a working solution


Solution

  • Since mongoDB version 5.0, one option is to use $setWindowFields:

    db.collection.aggregate([
      {$setWindowFields: {
          "partitionBy": "$uniqueKey",
          "sortBy": {"cust_id": 1},
          "output": {
            "count": {
              "$sum": 1,
              "window": {documents: ["unbounded", "unbounded"]
              }
            }
          }
      }},
      {"$match": {count: {"$gt": 1}}},
      {"$project": {_id: 0, "uniqueKey": 1, "cust_name": 1}}
    ])
    

    See how it works on the playground example

    This will prevent pushing all the documents with the same uniqueKey into one big document.

    Before mongoDB 5.0:

    One option is to use $group, but you need to use $push on your $group and $unwind afterwards. This option may be problematic if you have many documents with the same uniqueKey