Search code examples
sqldatabasesqlitepysqlite

Recovering database rows from an uncommitted transaction


We have a database that was written to by a program written in Python that uses the sqlite3 module. The database had a large number of insert statements executed on it, but the transaction was never ended by a commit.

The result is that we have two files:

     Size             Time       Name
855117824 2010-12-14 15:27 db
  1665240 2010-12-14 15:27 db-journal

The database file is large, but most of the data is uncommitted so when we select from the database, we only get a few rows. When we execute the sql command 'VACUUM', the database shrinks to about 3MB.

Is there any way to get the data back?


Solution

  • I performed a bit of testing using the sqlite3 shell program.

    Assuming that the sqlite3 Python module behaves in the same manner, there does not seem to be a way to reliably recover uncommitted transactions.

    For a relatively small number of statements, uncommitted transactions seem to remain entirely in the application memory only and no data is written to the filesystem. Those insertions are completely lost once the DB connection is closed or the application terminates.

    For larger transaction blocks data is written to the filesystem, but it is cleaned up once the DB connection is closed or (if the application crashes) on the next open of the DB. In simple terms, new DB pages get allocated for the uncommitted transaction, but if the transaction is not committed they are considered free space, which is why VACUUM reduces the DB size. These pages will be written to (and their data lost) at the next write to the DB file. If they are at the end of the DB file, the file simply gets truncated at clean-up.

    You might be able to recover some data from the last uncommitted transaction that was performed, as long as no other write transaction was performed afterwards. From the way your question is phrased, it sounds as if the whole DB was created and filled in a single program run and a single transaction (although VACUUM would not produce such a large file then). In that case things could be a bit easier.

    It heavily depends on how the misbehaving program was terminated. If you allowed it to terminate gracefully it may have had the time to clean-up, which in this case is not desirable. Since you have the DB journal I will assume that it had a more violent end.

    In any case, you would have to at least delve into the sqlite3 DB file format and modify the library code to parse the uncommitted data. You will still lose those parts of the transaction that remained in the application memory.

    If there were free pages (e.g. from DELETE statements) in the DB file there might also be fragments of older transactions, although interpretting those fragments is another story.

    In my opinion, the whole operation will stray way too close (if not outright enter) to the fields of computer forensics and data recovery, along with all the relevant issues. Unless you have really important data that you cannot get in any other way, I doubt it will be easy enough to be worth the trouble.