Search code examples
sqlitepython-3.xdiffpatchpython-db-api

Create Sqlite database patches while updating


Context:
A python 3.6 script is updating a Sqlite database several times a day using sqlite3 module.
The database is ~500Mo, each update adds up ~250Ko.

Issue:
I deliver every updated versions of the database and would like to reduce the size of the data transferred. In other words, I would like to transfer only the updated content (through a kind of patch).
The sqldiff.exe utility program could be used for that, nevertheless, it requires to create a local copy of the database every time I update it.

Question:
Is there a way, using Python (through the DB-API 2.0 interface or using other ways in Python), to generate this kind of patch while updating the database?


First thoughts:
Wouldn't it be possible to write a patch (e.g. a list of actions to be done to update the database) based on the cursor before/while performing the commit?

import sqlite3

# Open database
conn = sqlite3.connect('mydb.db')
cur = conn.cursor()

# Insert/Update data
new_data = 3.14
cur.execute('INSERT INTO mytable VALUES (?)', (new_data,))

# KEEP TRACK & Save (commit) the changes 
conn.dump_planned_actions()  # ?????
conn.commit()
conn.close()

Solution

  • The following snippet shows the workaround I found.

    It relies on the Sqlite3 method set_trace_callback to log all the SQL statements sent and executescript to apply these statements.

    import sqlite3
    
    class DBTraceCallbackHandler(object):
        """Class handling callbacks in order to log sql statements history."""
        def __init__(self):
            self.sql_statements = []
        def instance_handler(self, event):
            self.sql_statements.append(str(event))
    
    def database_modification(cursor):
        # user-defined
        pass
    
    def create_patch(db_path):
        # Openning connection
        conn = sqlite3.connect(db_path)
        c = conn.cursor()
        # Start tracing sql
        callback_handler = DBTraceCallbackHandler()
        conn.set_trace_callback(callback_handler.instance_handler)
        # Modification of database
        database_modification(c)
        # End of modification of database
        conn.commit()
        c.close()
        # Generating the patch - selecting sql statements that modify the db
        idx_rm = []
        for idx, sql_statement in enumerate(callback_handler.sql_statements):
            if not any([sql_statement.startswith(kw) for kw in ['UPDATE', 'INSERT', 'CREATE']]):
                idx_rm.append(idx)
        for idx in sorted(idx_rm, reverse=True):
            del callback_handler.sql_statements[idx]
    
        return ';\n'.join(callback_handler.sql_statements) + ';\n'
    
    
    def apply_patch(db_path, sql_script):
        # Openning connection
        conn = sqlite3.connect(db_path)
        c = conn.cursor()
        # Modification of database - apply sql script
        c.executescript(sql_script)
        # End of modification of database
        conn.commit()
        c.close()