We have started with one single mongodb but no we have one collection grown to ~300GB. The collection contains objects which have a date field. But mostly we just need to query the more recent objects then the historic once. So my question is: is it possible to shard this collection on one server by a date field? More explicitly I would like to shard more recent objects into one node and older objects into another node. Instead of equally distributing all the objects on n shards.
And is there a tutorial how one can shard an existing single database (without any replica sets) into a sharded cluster?
Technically you don't need to shard your content and just need to index your field. Yes you can create index on date field and it would be respected which you can see by visiting the query plan db.collection.explain("executionStats")
However, Choosing a shard key is very important. there are few things to consider while choosing the shard key
- Write scaling (high cardinality, Randomization)
- Query Isolation. (read)
choosing the date field actually gives a very high cardinality however it fails in doing the randomization and as a result all documents are stored into the single shard and hence it limits the write capacity of system. For the same reason ObjectId is discouraged to use as the shard key.
http://docs.mongodb.org/manual/core/sharding-shard-key/ Content from the above link.. "MongoDB generates ObjectId values upon document creation to produce a unique identifier for the object. However, the most significant bits of data in this value represent a time stamp, which means that they increment in a regular and predictable pattern. Even though this value has high cardinality, when using this, any date, or other monotonically increasing number as the shard key, all insert operations will be storing data into a single chunk, and therefore, a single shard. As a result, the write capacity of this shard will define the effective write capacity of the cluster."