ruby-on-rails mongodb ubuntu mongo-shell batch-insert

How to increase 'db.collection.insert()' batch insertion speed of 100K MongoDB objects

On my Ubuntu server I have a Ruby on Rails app that relies on MongoDB. I often use Mongoid to inject objects into the DB, but when injecting large amounts of object I compile a huge array of hashes and inject it with the mongo Shell method db.collection.insert():

ObjectName.collection.insert([{_id: BSON::ObjectId('5671329e4368725951010000'), name: "foo"}, {_id: BSON::ObjectId('567132c94368725951020000'), name: "bar"}])

The batch insertion time is a bottleneck for me. For example it takes 23 seconds to batch insert 150000 objects. Is it possible to allocate resources in a way that makes batch insertion faster?

Solution

You can try by using mongoid gem

batch = [{_id: BSON::ObjectId('5671329e4368725951010000'), name: "foo"}, {_id: BSON::ObjectId('567132c94368725951020000'), name: "bar"}]

Post.collection.insert(batch) #lest Post is the model

or you can do by Ruby MongoDb Driver

require 'mongo'
mongo_client = Mongo::MongoClient.new
coll = mongo_client['test_db']['test_collection']
bulk = coll.initialize_ordered_bulk_op
batch.each do |hash|
  bulk.insert(hash)
end
bulk.execute

and if you want it by mongo query by same way. You can follow Bulk Insert

For Increase the data you can use sharding and

Sharding is the process of storing data records across multiple machines and is MongoDB’s approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling. With sharding, you add more machines to support data growth and the demands of read and write operations.

Different scaling

Vertical scaling adds more CPU and storage resources to increase capacity. Scaling by adding capacity has limitations: high performance systems with large numbers of CPUs and large amount of RAM are disproportionately more expensive than smaller systems. Additionally, cloud-based providers may only allow users to provision smaller instances. As a result there is a practical maximum capability for vertical scaling. Sharding, or horizontal scaling, by contrast, divides the data set and distributes the data over multiple servers, or shards. Each shard is an independent database, and collectively, the shards make up a single logical database.