Search code examples
pythonmongodbpymongo

Obtaining Distinct values from mongodb as list of dictionaries using pymongo


I have a mongodb collection data in the following format

[{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 11:20", "ml_pred":"Invalid","hum_pred":"valid"},
 {"name":"axe2","base-url":"www.example2.com","date":"2022-06-22 12:20", "ml_pred":"Valid","hum_pred":"null"},
 {"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 22:20", "ml_pred":"Invalid","hum_pred":"valid"},
 {"name":"axe3","base-url":"www.example3.com","date":"2022-06-22 02:20", "ml_pred":"Valid","hum_pred":"null"},
 {"name":"axe2","base-url":"www.example2.com","date":"2022-06-22 06:20", "ml_pred":"Invalid","hum_pred":"valid"},
 {"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 14:20", "ml_pred":"Invalid","hum_pred":"null"},
 {"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 10:20", "ml_pred":"Invalid","hum_pred":"invalid"},
 {"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 01:20", "ml_pred":"Invalid","hum_pred":"null"}]

I am trying to get unique base-url and name as a response. For that I use pymongo distinct like below

filter_stuff = {'base-url': 1, 'name':1,'_id': 0}
data = list(crawlcol.find({},filter_stuff).distinct("base-url"))

which returned me a list of base urls. But I am expecting an output like

[{"name":"axe1","base-url":"www.example1.com"},
 {"name":"axe2","base-url":"www.example2.com"},
 {"name":"axe3","base-url":"www.example3.com"}]

How this can be obtained


Solution

  • This will give the result as required

    result = list(crawlcol.aggregate( 
                [
                    {"$group": { "_id": { "base-url": "$base-url", "name": "$name" } } }
                ]
            ))