Search code examples
pythonmysqlmongodbpymongopymysql

pymongo insert vs pymysql insert


I am testing MongoDb and MySQL using their python connector pymongo and pymysql, especially the insert function. pymongo version is 3.4, pymysql is 0.7.9, python is 3.5

My code looks like for mongo :

    client = MongoClient('localhost', 27017)
    db = client['local']
    collection = db['cqt']            

    for i in range (0,10):
        datas = [ {'a' : 1, 'b':2" }, {'a' : 3, 'b':4" }, ...] # 2000 dicts
        data =  [ bson.son.SON( d ) for d in datas]
        deltaT = time.clock()
        collection.insert_many( data )
        deltaT = time.clock() - deltaT 

And for mysql :

    connection = pymysql.connect(host='127.0.0.1',
                     user='admin',
                     password='toto',
                     db='cqt',
                     charset='utf8mb4',
                     cursorclass=pymysql.cursors.DictCursor)

    cursor = connection.cursor()
    sqlRequest = """INSERT INTO `cqt`.`myTable` (`a`,`b`) VALUES """


    for data in datas : sql += data.getSQL() + ","  # 2000 rows
    deltaT = time.clock()
    cursor.execute( sql )
    deltaT = time.clock() - deltaT 

Data are simple dictionnaries, 6 columns with a string key and a int value.

When plotting the deltaT variable (the required time to insert 2000 entries), I'm surprised to find that mongo is much slower than mysql, by a factor 10. insert duration results

I don't want to do a benchmark here but just to have estimation of database performance in the usecase I will need. However, from other tests made in java or looking at benchmark on the web between MySQL and MongoDB, my results with python are not what I was expecting at all. I should have quite similar performances for insert with MySQL and Mongo, and even better performance for Mongo.

So, is the pymongo connector slow ? Do you know if there are some parameters I should modify in my code or in MongoDB to improve the perf ? Any suggestion to have better performance with MongoDB ?


Solution

  • Actually,I think you need to collect some statistics by using a tool like MongoDB Management service and then make sure which is your bottleneck.

    And here are my advice:

    • Try to use multiprocessing and execute parallel Python script to insert document.You will have better performance,absolutely.

      Have a look at this example.

    • To maximise write performance,sharding is a good way.

    • Check out your disk I/O speed,maybe you can try SSD.

    See more details from write Operation performance.

    Hope this helps.