I'm doing a bulk-insert into a mongodb database. I know that 99% of the records inserted will fail because of a duplicate key error. I would like to print after the insert how many new records were inserted into the database. All this is being done in python through the tornado motor mongodb driver, but probably this doesn't matter much.
try:
bulk_write_result = yield db.collections.probe.insert(dataarray, continue_on_error=True)
nr_inserts = bulk_write_result["nInserted"]
except pymongo.errors.DuplicateKeyError as e:
nr_inserts = ???? <--- what should I put here?
Since an exception was thrown, bulk_write_result
is empty. Obviously I can (except for concurrency issues) do a count of the full collection before and after the insert, but I don't like the extra roundtrips to the database for just a line in the logfile. So is there any way I can discover how many records were actually inserted?
Regular insert with continue_on_error can't report the info you want. If you're on MongoDB 2.6 or later, however, we have a high-performance solution with good error reporting. Here's a complete example using Motor's BulkOperationBuilder:
import pymongo.errors
from tornado import gen
from tornado.ioloop import IOLoop
from motor import MotorClient
db = MotorClient()
dataarray = [{'_id': 0},
{'_id': 0}, # Duplicate.
{'_id': 1}]
@gen.coroutine
def my_insert():
try:
bulk = db.collections.probe.initialize_unordered_bulk_op()
# Prepare the operation on the client.
for doc in dataarray:
bulk.insert(doc)
# Send to the server all at once.
bulk_write_result = yield bulk.execute()
nr_inserts = bulk_write_result["nInserted"]
except pymongo.errors.BulkWriteError as e:
print(e)
nr_inserts = e.details['nInserted']
print('nr_inserts: %d' % nr_inserts)
IOLoop.instance().run_sync(my_insert)
Full documentation: http://motor.readthedocs.org/en/stable/examples/bulk.html
Heed the warning about poor bulk insert performance on MongoDB before 2.6! It'll still work but requires a separate round-trip per document. In 2.6+, the driver sends the whole operation to the server in one round trip, and the server reports back how many succeeded and how many failed.