I have a mongodb collection data in the following format
[{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 11:20", "ml_pred":"Invalid","hum_pred":"valid"},
{"name":"axe2","base-url":"www.example2.com","date":"2022-06-22 12:20", "ml_pred":"Valid","hum_pred":"null"},
{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 22:20", "ml_pred":"Invalid","hum_pred":"valid"},
{"name":"axe3","base-url":"www.example3.com","date":"2022-06-22 02:20", "ml_pred":"Valid","hum_pred":"null"},
{"name":"axe2","base-url":"www.example2.com","date":"2022-06-22 06:20", "ml_pred":"Invalid","hum_pred":"valid"},
{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 14:20", "ml_pred":"Invalid","hum_pred":"null"},
{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 10:20", "ml_pred":"Invalid","hum_pred":"invalid"},
{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 01:20", "ml_pred":"Invalid","hum_pred":"null"}]
I am trying to get unique base-url and name as a response. For that I use pymongo distinct like below
filter_stuff = {'base-url': 1, 'name':1,'_id': 0}
data = list(crawlcol.find({},filter_stuff).distinct("base-url"))
which returned me a list of base urls. But I am expecting an output like
[{"name":"axe1","base-url":"www.example1.com"},
{"name":"axe2","base-url":"www.example2.com"},
{"name":"axe3","base-url":"www.example3.com"}]
How this can be obtained
This will give the result as required
result = list(crawlcol.aggregate(
[
{"$group": { "_id": { "base-url": "$base-url", "name": "$name" } } }
]
))