Search code examples
pythonmysqlasynchronoustornadomysql-python

How to make MySQL Database calls in Tornado for an application with highly scalable infrastructure which makes a high number of database queries?


So which will be a better method to make database calls in Tornado for a high performance application with highly scalable infrastructure which makes a high number of database queries?

Method 1 : So I have come across async database drivers/clients like TorMySQL, Tornado-Mysql, asynctorndb,etc which can perform async database calls.

Method 2 : Use the general and common MySQLdb or mysqlclient (by PyMySQL) drivers and make the database calls to separate backend services and load-balance the calls with nginx. Similar to what the original Friendfeed guys did which they mentioned in one of the groups,

Bret Taylor, one of the original authors, writes :

groups.google.com/group/python-tornado/browse_thread/thread/9a08f5f08cdab108

We experimented with different async DB approaches, but settled on synchronous at FriendFeed because generally if our DB queries were backlogging our requests, our backends couldn't scale to the load anyway. Things that were slow enough were abstracted to separate backend services which we fetched asynchronously via the async HTTP module.

Other supporting links for Method 2 for better understanding what I am trying to say are:

Method 3 : By making use of threads and IOLoop and using the general sync libraries.

Supporting example links to explain what I mean,

Do it synchronously and block the IOLoop. This is most appropriate for things like memcache and database queries that are under your control and should always be fast. If it's not fast, make it fast by adding the appropriate indexes to the database, etc.

  • Another sample Method :

    For sync database(mysqldb), we can

    executor = ThreadPoolExecutor(4)

    result = yield executor.submit(mysqldb_operation)

Method 4 : Just use sync drivers like MySQLdb and start enough Tornado Instances and load balance using nginx that the application remains asynchronous on a broader level with some calls being blocked but other requests benefit the asynchronous nature via the large number of tornado instances.

'Explanation' :

For details follow this link - www.jjinux.com/2009/12/python-asynchronous-networking-apis-and.html, which says :

They allow MySQL queries to block the entire process. However, they compensate in two ways. They lean heavily on their asynchronous web client wherever possible. They also make use of multiple Python processes. Hence, if you're handling 500 simultaneous request, you might use nginx to split them among 10 different Tornado Web processes. Each process is handling 50 simultaneous requests. If one of those requests needs to make a call to the database, only 50 (instead of 500) requests are blocked.


Solution

  • FriendFeed used what you call "method 4": there was no separate backend service; the process was just blocked during the database call.

    Using a completely asynchronous driver ("method 1") is generally best. However, this means that you can't use synchronous libraries that wrap the database operations like SQLAlchemy. In this case you may have to use threads ("method 3"), which is almost as good (and easier than "method 2").

    The advantage of "method 4" is that it is easy. In the past this was enough to recommend it because introducing callbacks everywhere was tedious; with the advent of coroutines method 3 is almost as easy so it is usually better to use threads than to block the process.