Search code examples
mongodbmongodb-queryaggregateaggregation

How to make a SELECT DISTINCT using aggregation MongoDB


I am studying Mongo queries and I have a question about how to make a 'Select Distinct' on Mongo query using aggregate. I am really close to finish it.

My current output:

[
  {
    "user_id": "256f5280-fb49-4ad6-b7f5-65c4329d46e0"
  },
  {
    "user_id": "e194d667-d79f-4262-94b1-ecf4561c9418"
  }
]

This is the required output:

[
  {
    "time": 1922471890,
    "user_id": "256f5280-fb49-4ad6-b7f5-65c4329d46e0",
    "message": "This is an example of my db (4)"
  },
  {
    "time": 1622471890,
    "user_id": "e194d667-d79f-4262-94b1-ecf4561c9418",
    "message": "This is an example of my db (3)"
  }
]

My current aggregate query:

db.collection.aggregate([
  {
    $match: {
      user_id: {
        $regex: ""
      }
    }
  },
  {
    $sort: {
      time: -1
    }
  },
  {
    $group: {
      _id: null,
      user_id: {
        $addToSet: "$user_id",
        
      },
      
    }
  },
  {
    $unwind: "$user_id"
  },
  {
    $project: {
      _id: 0
    }
  },
  {
    $out: "collection"
  }
])

The dataset:

[
  {
    "time": 1422471890,
    "user_id": "256f5280-fb49-4ad6-b7f5-65c4329d46e0",
    "message": "This is an example of my db"
  },
  {
    "time": 1622471890,
    "user_id": "256f5280-fb49-4ad6-b7f5-65c4329d46e0",
    "message": "This is an example of my db (1)"
  },
  {
    "time": 1622471890,
    "user_id": "256f5280-fb49-4ad6-b7f5-65c4329d46e0",
    "message": "This is an example of my db (2)"
  },
  {
    "time": 1622471890,
    "user_id": "e194d667-d79f-4262-94b1-ecf4561c9418",
    "message": "This is an example of my db (3)"
  },
  {
    "time": 1922471890,
    "user_id": "256f5280-fb49-4ad6-b7f5-65c4329d46e0",
    "message": "This is an example of my db (4)"
  }
]

When I get all data, I want no duplicated user_id, so I just need the first found information from an id.

The Mongo Playground to this Mongo query.


Solution

  • Query

    • group by $user_id to have only distinct users
    • take the $first value of the duplicate values for the other fields

    *you can add sort or match stage based on your needs

    Playmongo

    aggregate(
    [{"$group": 
       {"_id": "$user_id",
        "time": {"$first": "$time"},
        "message": {"$first": "$message"}}}])