Search code examples
mongodbmongodb-querylimitaggregation-frameworkmongodb-aggregation

no affect of limit on cursor time aggrgate mongoDB


I am aggregating Data on a collection having 1 million records. Match query uses index. Find Code reference below -

    AggregateIterable<Document> aggregateIterable = timeCollection.aggregate(Arrays.asList(match, project,group)).batchSize(1000).allowDiskUse(true);
    long curStartTs = Calendar.getInstance().getTimeInMillis();
    MongoCursor<Document> cursor = aggregateIterable.iterator(); //this line roughly takes 15 seconds
    long curEndTs = Calendar.getInstance().getTimeInMillis();
    System.out.println("Cursor time - " + (curEndTs - curStartTs));

The final resultant list contains 3500 records.

Now I am limiting the records by passing $limit in the aggregate pipeline as -

    Document limitParam = new Document("$limit",30);
    AggregateIterable<Document> aggregateIterable = timeCollection.aggregate(Arrays.asList(match, project,group,limitParam)).batchSize(1000).allowDiskUse(true);
    long curStartTs = Calendar.getInstance().getTimeInMillis();
    MongoCursor<Document> cursor = aggregateIterable.iterator(); //this line still taking around 15 seconds
    long curEndTs = Calendar.getInstance().getTimeInMillis();
    System.out.println("Cursor time - " + (curEndTs - curStartTs));

The final resultant list contains now only 30 records.

I am not able to understand that why there is no time variation in two cases. Even after providing the limit in pipeline, why aggregateIterable.iterator() is taking the same time as that of the case when there is no limit in pipeline?

Thanks a lot in advance.

kind regards,

Vibhav


Solution

  • Aggregation $limit has no effect on the content of the documents it passes.

    By Seeing your code

    long curStartTs = Calendar.getInstance().getTimeInMillis();
    MongoCursor<Document> cursor = aggregateIterable.iterator(); //this line roughly takes 15 seconds
    long curEndTs = Calendar.getInstance().getTimeInMillis();
    System.out.println("Cursor time - " + (curEndTs - curStartTs));
    

    You are trying to find the time taken just for executing the query.

    To get a better idea of how much time actually taken in MongoDB to execute these queries we can execute the same queries in mongo shell with explain

    Sample Queries

    Without Limit

    db.foo.aggregate([ { 'conditions' }], {explain: true})
    

    With Limit

    db.foo.aggregate([{ 'conditions' }, {$limit: 10}], {explain: true})
    

    Also you may need to look on Performance of MongoDB query , Optimize Query, Analyze Query Plan and cursor limit

    Hope it helps!