Search code examples
djangosqlitegitlabmigrationgitlab-ci

Testing new django database migrations on filled live database or freshly created one?


I'm currently working on a django project running on docker with an sqlite3 database. We are using Gitlab as Version Control (including gitlab CI system). Im just testing for now and we have not deployed yet/are not live, but i have some issues thinking of the final workflow after deploying. Regarding the CI system, i was searching for a while now, but i didn't find an accurate answer on how the migrations and the database itself are tested in the CI system and how a proper development workflow looks like. Im mainly concerned with the following point:

  • Since all our migration files are under version control from the first start of the database (we will start with a fresh db.sqlite when going live). Is it then necessary to check every following migration on the (filled) live database (or latest backup?)? Or can i just create a fresh database for every migration check (when model changes are made).

To give an example maybe:

  • Im a developer and make changes to some models of the project.
  • Is it ok now for testing to create a fresh database, then firstly apply the old migrations to the database, then 'makemigrations' for the new changes and then apply the migrations for the new changes?
  • or do i have to somehow curl the current live database (or latest backup) with all the already applied migrations, then 'makemigrations' of the new changes and then apply the migrations of the new changes?
  • and after that push the changes and the new migrationfiles to gitlab and deploy it and then migrate them on the production server.

Are there any caveats for doing it the first way (with the fresh database)? Also regarding the CI pipeline on how to properly check for the correct migrations?


Solution

  • In general, you can get away with using a fresh db for testing, but it would be a good practice to test your migrations on a representative database (same engine/version/size as in production) that has existing records in it. This can be for a couple reasons:

    1. Not all model changes work the same on an empty database compared to one with existing records
    2. For large databases, you may want to measure the performance of a migration before applying it in production -- is it going to take 10 seconds or 10 hours to apply the migration?

    You could handle this in a few ways. You might create a new database from a backup, like you suggest. You might use/generate seed data, for example, loading a minimized data set or generating data with a script. Or you may also have a live pre-production environment such as a 'staging' environment where you can conduct this testing.


    Additional notes:

    Regarding (1): Depending on exactly what you changed in your model, you may end up needing to apply changes to existing records. For example, adding a new non-nullable field will work on an empty database without issues on migration. However, if there are existing records, you won't be able to migrate without providing the value for the new non-nullable field during migration (or if you set the default keyword on the field, this will be used).
    This doesn't necessarily require that your database be a copy/backup of your production database, but it does require it to contain at least some records affected by the migration in order to know your migration will work how you want before applying in production.

    Regarding (2): Additionally, there may be performance considerations, which may change depending on the database engine you are using or other characteristics of your database server. For example, when using Postgres, if you add a new non-nullable field (or a nullable field with a default set), this actually triggers a complete rewrite of the table which can take a significant amount of time on large tables with many records.
    Therefore, if you're concerned about performance (which you should be if you have a large database), ideally you want to test the performance of your migrations so you can anticipate operational impacts of running the migration in production (e.g., plan for downtime for long migrations).

    Regarding makemigrations: in most cases, creating auto migrations does not require you to be connected to any database. This is because migrations are defined using the ORM layer and don't require any knowledge of the current DB state. However, because the order of migrations is important, you need to make sure that previous existing migration files are present when you run makemigrations (this is, make sure your local migrations directory is not behind whatever is deployed or committed to your trunk or default branch). This is something you want to be careful of if you have multiple feature branches working on model changes in the same app within your django project at the same time.