Search code examples
mongodbmongodb-indexes

What is the _id_hashed index for in mongoDB?


I sharded my mongoDB cluster by hashed _id. I checked the index size, there lies an _id_hashed index which is taking much space:

   "indexSizes" : {
           "_id_" : 14060169088,
           "_id_hashed" : 9549780576
    },

mongoDB manual says that an index on the sharded key is created if you shard a collection. I guess that is the reason the _id_hashed index is out there.

My question is : what is the _id_hashed index for if I only query document by the _id field? can I delete it? as it takes too much space.

ps: it seems mongoDB use the _id index when query, not the _id_hashed index. execution plan for a query:

   "clusteredType" : "ParallelSort",
    "shards" : {
            "rs1/192.168.62.168:27017,192.168.62.181:27017" : [
                    {
                            "cursor" : "BtreeCursor _id_",
                            "isMultiKey" : false,
                            "n" : 0,
                            "nscannedObjects" : 0,
                            "nscanned" : 1,
                            "nscannedObjectsAllPlans" : 0,
                            "nscannedAllPlans" : 1,
                            "scanAndOrder" : false,
                            "indexOnly" : false,
                            "nYields" : 0,
                            "nChunkSkips" : 0,
                            "millis" : 0,
                            "indexBounds" : {
                                    "start" : {
                                            "_id" : "spiderman_task_captainStatus_30491467_2387600"
                                    },
                                    "end" : {
                                            "_id" : "spiderman_task_captainStatus_30491467_2387600"
                                    }
                            },
                            "server" : "localhost:27017"
                    }
            ]
    },
    "cursor" : "BtreeCursor _id_",
    "n" : 0,
    "nChunkSkips" : 0,
    "nYields" : 0,
    "nscanned" : 1,
    "nscannedAllPlans" : 1,
    "nscannedObjects" : 0,
    "nscannedObjectsAllPlans" : 0,
    "millisShardTotal" : 0,
    "millisShardAvg" : 0,
    "numQueries" : 1,
    "numShards" : 1,
    "indexBounds" : {
            "start" : {
                    "_id" : "spiderman_task_captainStatus_30491467_2387600"
            },
            "end" : {
                    "_id" : "spiderman_task_captainStatus_30491467_2387600"
            }
    },
    "millis" : 574

Solution

  • MongoDB uses a range based sharding approach. If you choose to use hashed based sharding, you must have a hashed index on the shard key and cannot drop it since it will be used to determine shard to use for any subsequent queries ( note that there is an open ticket to allow you to drop the _id index once hashed indexes are allowed to be unique SERVER-8031 ).

    As to why the query appears to be using the _id index rather than the _id_hashed index - I ran some tests and I think the optimizer is choosing the _id index because it is unique and results in a more efficient plan. You can see similar behavior if you shard on another key that has a pre-existing unique index.