The circumstance:
Previously, I had three machines:10.10.10.5, 10.10.10.6, and 10.10.10.7
10.10.10.5 runs:
10.10.10.6 runs:
10.10.10.7 runs:
My application connects to the 10.10.10.6 mongoS.
Everything was running well for about one year. Then, 10.5 and 10.6 experienced very heavy load, especially 10.6. The cpu usage and load average were very high, so I planned to add two new machines to the cluster.
I created two shards: shard1 and shard2. The new machine 10.10.10.8 runs:
The new machine 10.10.10.9 runs:
To the old member 10.10.10.7 I also added shard1, shard2 arbiters.
The problem is that when I added the two new machines (using the addShard command), about 5 hours later they finished the migration (though I can't make sure), then the 10.10.10.6 host once again had extremely high load, the load average about 90.5 (4 cpus).
Meanwhile there are many writes and reads request from application to 10.10.10.6 mongoS, but rarely any data or no data written to the new two machines. I used iostat to find there were almost no io bytes in the two new machines.
Why is 10.10.10.6 so highly loaded?
Previously even in peak time the highest load was about 30.5
So could you guys kindly advise how to fix the load issues and get the new machines up and running?
Edit: More information about my environment
10.5, 10.6, 10.7, 10.8, 10.9 all have the same resouce:4CPUS,6g Mem,150G diskspace,the netio is optical fiber.
Shard3 datasize=16g and Shard4 datasize 15g.
I am using 1.8.2
Edit: After discussion in chat
It is expected that there will be some overhead when adding new shards, at least initially. This is because chunk migrations need to take place, and these will use CPU, disk and network I/O. This will add some additional load to your environment.
If your read preference is set to read from secondaries, the 10.6 server could quickly become overloaded as it tries to keep up with the replication for two replica sets (which will be increased due to chunk migration) and the traffic from the application itself. Potentially this could be reduced by adding more secondaries, but you need to test this in an environment that closely mimics your production environment.
Adding more shards could potentially help as well, but again you need to test this thoroughly. It would appear that when you added shards previously the chunk migrations did not complete, so the new shards did not help out with the load as much as they should have done. If you are to add shards again in future, make sure that the chunks have finished migrating by checking db.getSiblingDB("config").locks.find({"_id":'balancer'})
and the output of db.printShardingStatus()
to see that the number of chunks is equal across all shards.
Some more general notes:
In production it is inadvisable to have only an single config server running. If you lose this single config server, the cluster will become unusable. See more details here and here
Generally speaking it is not recommended to run two
mongod
instances on the same machine. Two processes will compete for resources that they share, and this is especially true when using memory mapped files as MongoDB does.
You can determine what queries and processes are causing the most
load by using some built in tools. mongostat
and
mongotop
Edit: MongoTop is not available in 1.8.2 are two command line utilities that allow you to
track usage of mongodb
. Inside the console you can also run
db.currentOp()
to get more information on the current
operation. You can tell what the balancer is doing (whether or not
it is currently doing a balancer round) by issuing
db.getSiblingDB("config").locks.find({"_id":'balancer'})
from the
console.
You are running on a very old version of MongoDB. You should plan to update, if not to the latest stable (2.2.0) or most recent (2.0.7) then to the last stable of the branch you are on (1.8.5). There have been numerous fixes and improvements made to the product since the release you are currently using, which will provide many benefits.