Search code examples
csqliterace-conditiontocttou

How do I detect if sqlite3 created a database file?


I'm writing a program that uses a sqlite3 database file to store its data. If I open a database file with

sqlite3_open_v2(filename, &db, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE, NULL)

the database file is created if it does not exist. How can I find out if the database file existed prior to opening it? The sqlite3 shell uses code like this:

  /* Go ahead and open the database file if it already exists.  If the
  ** file does not exist, delay opening it.  This prevents empty database
  ** files from being created if a user mistypes the database name argument
  ** to the sqlite command-line tool.
  */
  if( access(data.zDbFilename, 0)==0 ){
    open_db(&data, 0);
  }

This code however has a race condition when the database file is created by another process after the access call and before the open_db call (time-of-check vs. time-of-use).

Another answer (that I can't find right now) suggests to check the application_id and user_version fields. If they are zero, a database was just created. I researched this approach and found that numerous applications actually don't bother to set these fields in newly created databases, so this approach is fuzzy at best and I don't think it solves my problem.

Is there a method to find out if the database existed prior to opening it that does not involve a race condition like this? It is also acceptable if I can merely find out if the database file was initialized (as in, a truncated file was populated with a sqlite3 header) by sqlite3.

The purpose of having such a routine is to be able to find out if I need to create all the tables I need in the database. I don't want to accidentally overwrite another file placed there by a different application.

Sample code

A simplified illustration of the problem can be found in the following bash script. It simulates two applications. They both work in similar ways:

  1. Create or open a database test.db
  2. If the database did not exist before, create a table test and write a single row into it. The value is 1 for the first application and 2 for the second.
  3. Print the contents of the database.

Here is the code:

#!/bin/sh

# sleeps are inserted to make the race conditions easier to trigger

application() {
    echo Checking if database exists...
    [ ! -f test.db ]
    status=$?
    sleep 2

    if (exit $status)
    then
        echo Database not found, making tables
        sqlite3 test.db "CREATE TABLE IF NOT EXISTS test (a);"
        sleep 2
        echo Writing initial record into database
        sqlite3 test.db "INSERT INTO test VALUES ($1);"
        sleep 2
    else
        echo Database found, checking if it belongs to me
    fi

    echo Printing content of database
    sqlite3 test.db "SELECT * FROM test;"
}

rm -f test.db
echo First test: app1 and app1 race
application 1 & (sleep 1 ; application 1)

rm -f test.db
echo Second test: app2 and app1 race
application 2 & (sleep 1 ; application 1)

My goal is to ensure that the following cases can never happen:

  • An instance of application one opens the database file, concludes that it isn't initialized and initializes it (by creating the table and writing the initial record) even though it already contains data from a different instance of the same application or from a different application.
  • An database file belonging to a different application is written into.

If application was programmed correctly, each run would only initialize the database if it wasn't initialized before. Thus, the only output you would see is either a database containing only the row 1 or a database that contains only the row 2.


Solution

  • It is not possible to differentiate a newly-created database from an empty database that was created earlier.

    However, an empty database is the only one with this problem. Any other database can be detected by checking if the sqlite_master table is not empty. Do this and your table creation inside a transaction, and there is no race condition.

    If your first write to the database (which is when the file is actually created) is setting the application_id, then you know that any file with another ID is not yours. (Registered application IDs are unique.)