Search code examples
mongodbsortingmongodb-indexes

Is there a way to make mongodb to use index, which may not quite fit, but lets sort results blockwise rather than all together?


I have a collection test, and a compound index on it with two fields

db.test.createIndex({ i: 1, j: 1 })

When I execute following pipeline

db.test.aggregate([{ $sort: { i: 1, j: 1 } }], { allowDiskUse: false })

it works fine. But this pipeline

db.test.aggregate([{ $sort: { i: 1, j: -1 } }], { allowDiskUse: false })

fails with the error that says "Sort exceeded memory limit". The reason is more less clear. The sort order in the pipeline does not match the order in the index and therefore mongodb decides not to use the index and sort the whole collection, which, in turn, does not fit in memory.

However I suspect that mongodb could be slightly smarter. Instead of sorting the whole collection it could use the index to delimit blocks of documents, for which field i is the same, and then sort documents only within such blocks. The documents of the same block have more chances to fit in memory and therefore the pipeline can perform more efficiently. Can I make mongodb server do so? How? If not, what prevents this.


Solution

  • A similar question was asked a few days later here. As @Tom Slabbaert mentioned in the comments, the answer is that no, at the time of writing, MongoDB does not appear to support using the index in the situation described to provide an incremental sort. There is no (non-hacky) way to force the system to do this, especially in a way that would be flexible and deliver performance benefits.

    Some additional things to consider with respect to the presumed goal of improved performance:

    • What's the end result of what you're trying to achieve here? Is there a particular reason that would the compound sort is necessary and/or that the index couldn't be adjusted (to have j in descending order to allow it to support the sort)?
    • The sample pipelines explicitly have allowDiskUse set to false. Is there a reason for that? Setting it to true should allow the operation to complete successfully.
    • Relatedly, allowDiskUse now defaults to true beginning in version 6.0.

    Edit: Per the comments, the request for this functionality in MongoDB appears to be tracked either here or here.