Search code examples
pythongitsqlitegithubpython-poetry

Should an embedded SQLite DB used by CLI app be uploaded to version-control (Git)?


I'm working on a Python CLI app that has to manage some data on a sqlite db (creating, updating and deleting records). I want the users to be able to install the app and use it right away. So my question is, can I just upload an empty sqlite db to GitHub? Or should I just upload a schema file and during installation build the db in a build step? I suppose if going the second way, users should have sqlite pre-installed or else the installation will fail. What I want is for them to just install the app, without worrying about dependencies and such.


Solution

  • When it comes to SQLite, My understanding is that SQLite is generally used as an embedded DB thus users wouldn't need to have SQLite preinstalled. (Of course, it can be used as a standalone DB server, but it's mainly known for its "ease of embeddability" aka...simply just run). Without any effort, in the embedded form, the client itself would create this db.

    Using SQLite is just a one-liner as:

    conn = sqlite3.connect('my.db')
    

    or

    conn = sqlite3.connect('/path/to/my.db')
    

    Or even in-memory (as cache)

    conn = sqlite3.connect(':memory:')
    

    When this line runs, it would create a connection by either opening the file (if it exists) or create this file (as an empty DB) if the file is not present. In short, The SQLite library will always read the existing file or create it if it doesn't exist. Thus, You will always have a running DB out of the box. (The only time I can see it failing is if this db file is corrupt for some reason or the SQLite library cannot create the file in a location due to permission issues)

    From a user perspective (or developer perspective for that matter), there is nothing that needs to be done to install SQLite. There are no external dependencies for embedded DB or anything to be preinstalled. It simply works. If there are other applications that share this database, they just need to open the particular db file and that's it.

    Therefore coming back to your main question, the general best practice is that the application instantiates the database (Whatever the DB is for that matter) on its first run by importing the SQL/Schema (and initial data) file (SQL File, CSV, JSON, XML, from code etc...). The SQL/Schema file can be maintained along with the application source in Github (or whatever VCS) or packaged with the binary in the packaged format (zip, tar...etc) that is given for distribution. So in your case, the second approach that you have thought of might be better. This is even good from a code maintenance and review perspective.

    It is best not to upload the "database" as a binary, rather instantiate it on the first run and populate it with data.