Search code examples
pythonsqldjangoormbulk

Django: Bulk operations


Business:
I encountered a problem - when operating with large datasets with Django ORM, canonical way is manipulate with every single element. But of course this way is very inefficient. So I decided to use raw SQL.

Substance:
I have a basic code which forms SQL query, which updates rows of table, and commiting it:

from myapp import Model
from django.db import connection, transaction
COUNT = Model.objects.count()
MYDATA = produce_some_differentiated_data() #Creating individual value for each row
cursor = connection.cursor()
str = []
for i in xrange(1, COUNT):
    str.append("UPDATE database.table\n"
               "SET field_to_modify={}\n"
               "WHERE primary_key_field={};\n".format(MYDATA, i))


str = ''.join(str)
cursor.execute(str)
transaction.commit_unless_managed() #This cause exception

And on last statement I get this, even when SIZE is small:

_mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now")

Maybe Django do not allow execute multiple SQL queries at once?

ps Closing cursor before commiting helps to avoid exception, but is this correct?

My expectations:
Im looking for every possible solid solution for bulk operations (preferably inside Django). I dont care about will it be ORM or raw SQL, I would have stand with code I pasted above, if I could avoid error. In case of no solutions it will be good at least, just for curiosity, to know reason of this exception.

What I have learned besides answers:
In Django 1.4 was introduced bulk_create, for efficient multiple INSERT operations


Solution

  • Use cursor.executemany(query, param_list) if you need a raw SQL.

    param_list = [("something_1", 1), ("something_2", 2),...]
    # or everything like [(some_number_1, 1)...]. Apostrophes around the substituted
    # "%s" and the complete necessary escaping is added automatically for string
    # parameters.
    
    cursor.executemany("""UPDATE database.table
            SET field_to_modify=%s
            WHERE primary_key_field=%s""",
            param_list)
    

    It has many advantages:

    • The query string is much shorter than a big query and can be fast analyzed/optimized without consuming unnecessary resources by the database planner. (If you parse parameters into SQL manually, you get many different SQL command that must be analyzed individually.)
    • Parsing strings into SQL manually is a bad practise because it can be a security issue (SQL injection attack) if you don't escape unexpected apostrophes and backslashes from the user input correctly.

    It is an undocumented method though both methods execute(self, sql, params=()) and executemany(self, sql, param_list) are supported for cursor objects by all native db backends (mysql, postgesql_psycopg2, sqlite3, oracle) for long time since Django-0.96 to the current 1.5-beta. A useful similar answer is https://stackoverflow.com/a/6101536/448474 .

    The method executemany had two fixed issues related to exception handling in previous years. So, verify for your Django version that you get helpful error messages if you intentionally cause a database exception, too much %s or too little etc. Yet, a few minutes of initial tinkering/testing are faster than many hours of waiting for slow methods.