Search code examples
mongodbsortingindexingmultikey

mongodb multikey index seems like not being used when sort


Let's assume that I have tx_collection which has 3 documents like below

{
    "block_number": 1,
    "value": 122
    "transfers": [
        {
            "from": "foo1", 
            "to": "bar1", 
            "amount": 111
        },
        {
            "from": "foo3", 
            "to": "bar3", 
            "amount": 11
        },
    ]
},
{
    "block_number": 2,
    "value": 88
    "transfers": [
        {
            "from": "foo11", 
            "to": "bar11", 
            "amount": 33
        },
        {
            "from": "foo22", 
            "to": "bar22", 
            "amount": 55
        },
    ]
},
{
    "block_number": 3,
    "value": 233
    "transfers": [
        {
            "from": "foo1", 
            "to": "bar1", 
            "amount": 33
        },
        {
            "from": "foo3", 
            "to": "bar3", 
            "amount": 200
        },
    ]
}

For the performance issue, I create multikey index on transfers.amount

When I sort by transfers.amount,

db.getCollection('tx_transaction').find({}).sort({"transfers.amount":-1})

what I expected order of documents is sorted by max value of subfield transfers.amount like

{
    "block_number": 3,
    "value": 233
    "transfers": [
        {
            "from": "foo1", 
            "to": "bar1", 
            "amount": 33
        },
        {
            "from": "foo3", 
            "to": "bar3", 
            "amount": 200
        },
    ]
},
{
    "block_number": 1,
    "value": 122
    "transfers": [
        {
            "from": "foo1", 
            "to": "bar1", 
            "amount": 111
        },
        {
            "from": "foo3", 
            "to": "bar3", 
            "amount": 11
        },
    ]
},
{
    "block_number": 2,
    "value": 88
    "transfers": [
        {
            "from": "foo11", 
            "to": "bar11", 
            "amount": 33
        },
        {
            "from": "foo22", 
            "to": "bar22", 
            "amount": 55
        },
    ]
}

The sort works well since there are only 3 documents. Sorted order is block number 3 -> block number 1 -> block_number 2 which I expected

My issue is that when there is 19 million documents, it throws error message

The massage is like

"errmsg" : "Executor error during find command: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.",

It seems that multikey index is not used when sort.

do you have any idea why this error message is thrown?

JFYI.

  • My mongodb version is 3.6.3
  • tx_collection is sharded

Solution

  • As of MongoDB 3.6 and newer, I think this is to be expected as mentioned in Use Indexes to Sort Query Results where it stated:

    As a result of changes to sorting behavior on array fields in MongoDB 3.6, when sorting on an array indexed with a multikey index the query plan includes a blocking SORT stage. The new sorting behavior may negatively impact performance.

    In a blocking SORT, all input must be consumed by the sort step before it can produce output. In a non-blocking, or indexed sort, the sort step scans the index to produce results in the requested order.

    In other words, "blocking sort" means the presence of the SORT_KEY_GENERATOR stage, the stage that means in-memory sort. This was changed from pre-3.6 MongoDB due to SERVER-19402 to address the inconsistencies around sorting an array field.

    There is a ticket to improve this situation: SERVER-31898. Unfortunately there is no workaround for this behaviour just yet.