Context:
A python 3.6 script is updating a Sqlite database several times a day using sqlite3 module.
The database is ~500Mo, each update adds up ~250Ko.
Issue:
I deliver every updated versions of the database and would like to reduce the size of the data transferred. In other words, I would like to transfer only the updated content (through a kind of patch).
The sqldiff.exe utility program could be used for that, nevertheless, it requires to create a local copy of the database every time I update it.
Question:
Is there a way, using Python (through the DB-API 2.0 interface or using other ways in Python), to generate this kind of patch while updating the database?
First thoughts:
Wouldn't it be possible to write a patch (e.g. a list of actions to be done to update the database) based on the cursor before/while performing the commit?
import sqlite3
# Open database
conn = sqlite3.connect('mydb.db')
cur = conn.cursor()
# Insert/Update data
new_data = 3.14
cur.execute('INSERT INTO mytable VALUES (?)', (new_data,))
# KEEP TRACK & Save (commit) the changes
conn.dump_planned_actions() # ?????
conn.commit()
conn.close()
The following snippet shows the workaround I found.
It relies on the Sqlite3 method set_trace_callback
to log all the SQL statements sent and executescript
to apply these statements.
import sqlite3
class DBTraceCallbackHandler(object):
"""Class handling callbacks in order to log sql statements history."""
def __init__(self):
self.sql_statements = []
def instance_handler(self, event):
self.sql_statements.append(str(event))
def database_modification(cursor):
# user-defined
pass
def create_patch(db_path):
# Openning connection
conn = sqlite3.connect(db_path)
c = conn.cursor()
# Start tracing sql
callback_handler = DBTraceCallbackHandler()
conn.set_trace_callback(callback_handler.instance_handler)
# Modification of database
database_modification(c)
# End of modification of database
conn.commit()
c.close()
# Generating the patch - selecting sql statements that modify the db
idx_rm = []
for idx, sql_statement in enumerate(callback_handler.sql_statements):
if not any([sql_statement.startswith(kw) for kw in ['UPDATE', 'INSERT', 'CREATE']]):
idx_rm.append(idx)
for idx in sorted(idx_rm, reverse=True):
del callback_handler.sql_statements[idx]
return ';\n'.join(callback_handler.sql_statements) + ';\n'
def apply_patch(db_path, sql_script):
# Openning connection
conn = sqlite3.connect(db_path)
c = conn.cursor()
# Modification of database - apply sql script
c.executescript(sql_script)
# End of modification of database
conn.commit()
c.close()