Implement batch insert to improve performance

I have written the following code to insert data to MEMSql which has almost similar syntax like in MySQL.

def get_connection(db=DATABASE):
    """ Returns a new connection to the database. """
    return database.connect(host=HOST, port=PORT, user=USER, password=PASSWORD, database=db)



def insert_data(data):
    print 'inserting data...'

    for item in data:
        vars_to_sql = []
        keys_to_sql = []
        print(item)
        for key,value in item.iteritems():
             if key == '__osHeaders':
                value = str(value)
             if isinstance(value, unicode):
                vars_to_sql.append(value.encode('ascii', 'ignore'))
                keys_to_sql.append(key.encode('ascii', 'ignore'))
             else:
                vars_to_sql.append(value)
                keys_to_sql.append(key)

        keys_to_sql = ', '.join(keys_to_sql)
        with get_connection() as conn:

             c = conn.execute("INSERT INTO tablename (%s) VALUES %r" % (keys_to_sql,  tuple(vars_to_sql),))
             print c

The field names can not be hard coded since they might change according to the data I get from the other end. Any way its a dictionary I'm iterating here. Since this single insertion is very slow I need to take batch size as a variable, form the query statement and insert it accordingly. So the query for batch size of 2 will be INSERT INTO tablename col1, col2 VALUES ('a', 'b'),('c','d')

Please help me how to introduce it here.

Solution

If you are using the MemSQL Python library you can use the multi_insert helper provided in the memsql.common.query_builder package. For example:

from memsql.common.query_builder import multi_insert
from memsql.common.database import connect

sql, params = multi_insert("my_table", { "foo": 1 }, { "foo": 2 })
# sql = 'INSERT INTO `my_table` (`foo`) VALUES (%(_QB_ROW_0)s), (%(_QB_ROW_1)s)'
# params = {'_QB_ROW_0': [1], '_QB_ROW_1': [2]}

with connect(...) as conn:
    conn.execute(sql, **params)

Note that multi_insert requires that each record has the same set of columns defined since it translates it to a tuple based insert for the query.