Search code examples
mongodbpymongomongodb-aggregation

MongoDB, PyMongo - aggregate with find conditions


I have > 8000 records in my DB and this is one of them :

{
    "_id" : ObjectId("57599c498c39598eafb781b9"),
    "_class" : "vn.cdt.entity.db.AccessLog",
    "url" : "/shop/huenguyenshop/browse",
    "ip" : "10.0.0.238",
    "sessionId" : "86E5CF8E6D465A6EDFE7C9BF7890AA4B",
    "oldSessionId" : "86E5CF8E6D465A6EDFE7C9BF7890AA4B",
    "cookie" : "{\"sessionId\":\"86E5CF8E6D465A6EDFE7C9BF7890AA4B\",\"objects\":[{\"id\":\"903815555908\",\"type\":\"VIEW_SHOP\",\"count\":1}]}",
    "isCookie" : true,
    "createTime" : NumberLong(1464935913641),
    "objectId" : "903815555908",
    "type" : "VIEW_SHOP"
}

What i want to do :

I want to find all record have same oldSessionId with (type: VIEW_ITEM or type: BUY_ITEM) and createTime is lastest.

What i had tried :

pipeline = ([
                {"$group" : { "_id": "$oldSessionId", "count": { "$sum": 1 } }},
                {"$match": {"count" : {"$gt": 1} } },
                {"$project": {"oldSessionId" : "$_id", "_id" : 0} }
            ])

But that pipeline give me only sessionId

    find({'createTime': {'$lt':1464419127000, '$gt':1464332727000}, 
'$or':[{'type':'BUY_ITEM'},{'type':'VIEW_ITEM'}]})

That find give me all records with type: VIEW_ITEM or type: BUY_ITEM in specific time.

I don't know how to add filter with type and createTime to get what i want.

Update Thanks @chridam for helping me :

If i want add specific date to aggregation, i can make add query like this :

 pipeline = \
    (
        [
            { "$match": {
                         "createTime": {"$lt":1464419127000, "$gt":1464332727000 },
                         "type": { "$in": ["VIEW_ITEM", "BUY_ITEM"] }
                        }
            },
            { "$sort": { "createTime": -1, "oldSessionId": 1 } },
            {
                "$group":
                    { "_id": "$oldSessionId",
                      "_class": { "$first": "$_class" },
                      "url": { "$first": "$url" },
                      "ip": { "$first": "$ip" },
                      "sessionId": { "$first": "$sessionId" },
                      "oldSessionId": { "$first": "$oldSessionId" },
                      "cookie": { "$first": "$cookie" },
                      "isCookie": { "$first": "$isCookie" },
                      "createTime": { "$first": "$createTime" },
                      "objectId": { "$first": "$objectId" },
                      "type": { "$first": "$type" },
                    }
            }

        ]

    )

Solution

  • To get all the documents that have the same oldSessionId with (type: VIEW_ITEM or type: BUY_ITEM) and createTime is lastest, you need to conduct an aggregation pipeline show that has the following actors (stages):

    1. $match stage:

      • This will filter all documents that have a type of either VIEW_ITEM or BUY_ITEM. You can use the $in operator with the query as it allows you to select the documents where the value of the type field equals any value a the specified array, which happens to be a list with the two possible type values i.e. ["VIEW_ITEM", "BUY_ITEM"].
    2. $sort stage

      • this will afford the documents from the previous pipeline (above) to be ordered. This is necessary because you want to aggregate these filtered documents on the latest createTime field.
    3. $group stage

      • In this ultimate step, you group all the ordered documents by the oldSessionId key, add the fields you want using the $first operator.

    Piecing all the above pipes together to form the following aggregation pipeline:

    pipeline = [
        { "$match": {  "type": { "$in": ["VIEW_ITEM", "BUY_ITEM"] } } },
        { "$sort": { "createTime": -1, "oldSessionId": 1 } },
        {
            "$group": {
                "_id": "$oldSessionId",
                "_class": { "$first": "$_class" },
                "url": { "$first": "$url" },
                "ip": { "$first": "$ip" },
                "sessionId": { "$first": "$sessionId" },
                "cookie": { "$first": "$cookie" },
                "isCookie": { "$first": "$isCookie" },
                "createTime": { "$first": "$createTime" },
                "objectId": { "$first": "$objectId" },
                "type": { "$first": "$type" },
            }
        }
    ]