Search code examples
mongodbaggregation-framework

MongoDB filtering out subdocuments with lookup aggregation


Our project database has a capped collection called values which gets updated every few minutes with new data from sensors. These sensors all belong to a single sensor node, and I would like to query the last data from these nodes in a single aggregation. The problem I am having is filtering out just the last of ALL the types of sensors while still having only one (efficient) query. I looked around and found the $group argument, but I can't seem to figure out how to use it correctly in this case.

The database is structured as follows:

nodes:

{
    "_id": 681
    "sensors": [
            {
                "type": "foo"
            },
            {
                "type": "bar"
            }
    ]
}

values:

{
    "_id" : ObjectId("570cc8b6ac55850d5740784e"),
    "timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
    "type" : "foo",
    "nodeid" : 681,
    "value" : 10
}

{
    "_id" : ObjectId("190ac8b6ac55850d5740776e"),
    "timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
    "type" : "bar",
    "nodeid" : 681,
    "value" : 20
}

{
    "_id" : ObjectId("167bc997bb66750d5740665e"),
    "timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
    "type" : "bar",
    "nodeid" : 200,
    "value" : 20
}

{
    "_id" : ObjectId("110cc9c6ac55850d5740784e"),
    "timestamp" : ISODate("2016-04-09T12:06:46.344Z"),
    "type" : "foo",
    "nodeid" : 681,
    "value" : 12
}

so let's imagine I want the data from node 681, I would want a structure like this:

nodes:

{
    "_id": 681
    "sensors": [
            {
                "_id" : ObjectId("570cc8b6ac55850d5740784e"),
                "timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
                "type" : "foo",
                "nodeid" : 681,
                "value" : 10
            },
            {
                "_id" : ObjectId("190ac8b6ac55850d5740776e"),
                "timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
                "type" : "bar",
                "nodeid" : 681,
                "value" : 20
            }
    ]
}

Notice how one value of foo is not queried, because I want to only get the latest value possible if there are more than one value (which is always going to be the case). The ordering of the collection is already according to the timestamp because the collection is capped.

I have this query, but it just gets all the values from the database (which is waaay too much to do in a lifetime, let alone one request of the web app), so I was wondering how I would filter it before it gets aggregated.

query:

db.nodes.aggregate(
        [
            {
                $unwind: "$sensors"
            }, 
            {
                $match:{
                    nodeid: 681
                       }
            }, 
            {
                $lookup:{
                        from: "values", localField: "sensors.type", foreignField: "type", as: "sensors"
                    }
             }
         }
    ]
)

Solution

  • as far as I got document structure, there is no need to use $lookup as all data is in readings(values) collection.

    Please see proposed solution:

    db.readings.aggregate([{
                $match : {
                    nodeid : 681
                }
            },
            {
                $group : {
                    _id : {
                        type : "$type",
                        nodeid : "$nodeid"
                    },
                    readings : {
                        $push : {
                            timestamp : "$timestamp",
                            value : "$value",
                            id : "$_id"
                        }
                    }
                }
            }, {
                $project : {
                    _id : "$_id",
                    readings : {
                        $slice : ["$readings", -1]
                    }
                }
            }, {
                $unwind : "$readings"
            }, {
                $project : {
                    _id : "$readings.id",
                    type : "$_id.type",
                    nodeid : "$_id.nodeid",
                    timestamp : "$readings.timestamp",
                    value : "$readings.value",
    
                }
            }, {
                $group : {
                    _id : "$nodeid",
                    sensors : {
                        $push : {
                            _id : "$_id",
                            timestamp : "$timestamp",
                            value : "$value",
                            type:"$type"
                        }
                    }
                }
            }
    
        ])
    

    and output:

    {
        "_id" : 681,
        "sensors" : [ 
            {
                "_id" : ObjectId("110cc9c6ac55850d5740784e"),
                "timestamp" : ISODate("2016-04-09T12:06:46.344Z"),
                "value" : 12,
                "type" : "foo"
            }, 
            {
                "_id" : ObjectId("190ac8b6ac55850d5740776e"),
                "timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
                "value" : 20,
                "type" : "bar"
            }
        ]
    }
    

    Any comments welcome!