Search code examples
mongodboptimizationquery-optimizationpymongo

Optimize mongoDB aggregation that uses $min and $max


I have about 3 million documents in my database. I have this query to get the min and max latitudes and longitudes contains in my documents because in my app I want to zoom on the existing data contained within a square. It takes about 16 seconds to execute :

Square represents 4 coordinates. tMin and tMax is my time interval (date).

    cursor = db.collection.aggregate([
        {
            "$match":
            {
                "nodeLoc":{"$geoWithin":{"$geometry":square}}, "t": {"$gt": tMin, "$lt": tMax}
            }
        },
        {
            "$group":
            {
                "_id": {},
                "minLat": {"$min": {"$arrayElemAt": [ "$nodeLoc.coordinates", 1]}},
                "maxLat": {"$max": {"$arrayElemAt": [ "$nodeLoc.coordinates", 1]}},
                "minLon": {"$min": {"$arrayElemAt": [ "$nodeLoc.coordinates", 0]}},
                "maxLon": {"$max": {"$arrayElemAt": [ "$nodeLoc.coordinates", 0]}}
            }
            }
    ]
    )

Is there a way I can optimize the $group or the $match stage ? I already created a compound index on nodeLoc (2dsphere) and t but I didn't see any improvement.

Edit:

I remove the indexes but the time to execute the query stay the same.


Solution

  • The group phase will not use an index imho, and this is very expensive scan via all data values.

    The only possible idea for me is to reduce match criteria to operate on smaller dataSet and aggregate final results in app.

    if we have, let's say 5 days from tMin and tMax - faster response could be achieved by running query in 5 rounds. and finally make an array merge and scan for min, max on 20 array entries.

    Makes this any sense for you?

    Any comments welcome!