Search code examples
mongodbpymongo

Make Mongodb/pymongo query start executing immediately instead of waiting for the first iteration


coll = MongoClient(uri)[database][collection]
cursor = coll.find(query, sort=sort) # I want the query to start executing here!
# It doesnt matter to my application if we're stuck here for a couple seconds
# ...
doc = next(cursor) # this must not be slow!

Note: I dont want to fetch all documents right away, just get the first document/batch of documents to avoid the lazy loading.


Solution

  • There is an internal _refresh method of the cursor to fetch first batch of the documents to the internal cache.

    For the testing script:

    from pymongo import MongoClient
    import ssl
    from datetime import datetime
    import time
    
    connection_string = "mongodb://localhost:27017"
    print("start", datetime.now())
    client = MongoClient(connection_string, ssl_cert_reqs=ssl.CERT_NONE)
    print("after connect", datetime.now())
    
    db = client['test']
    collection = db['companies']
    
    documents = collection.find({}).limit(3)
    print("after find", datetime.now())
    
    documents._refresh() # <== THIS
    
    print("after refresh", datetime.now())
    
    time.sleep(2)
    for document in documents:
        time.sleep(1)
        print("next document", datetime.now())
        print(document['_id'])
    
    # Print the fetched document
    print("end", datetime.now())
    

    The screenshot below shows mongodb profiler with the timestamp of the query reaching the db on _refresh: enter image description here


    Disclaimer: use internal methods with care, and always cover with tests. They might change without a notice.