Search code examples
databasemigrationdatabase-schema

Separating schema and data migrations


We currently have mixed data and schema migrations, that are being run by app's ORM right before the app startup. Data migrations in our case is a type of migration when we manipulate the data, for example moving data around different columns, fixing data format issues, fixing some wrong data or in some rare cases moving data between different services databases. They are normally simple ( a few create/update statements), but there are cases when we leverage programming language with some logic to calculate a final state. What I'm looking for is to make database schema declarative and apply schema migrations whenever someone changes db schema file in the repository. This could be achieved by using some tools like atlasgo.io, where you have a schema file defined and generate versioned schema migrations by calculating a diff between state file and actual database.

So if I apply this approach, our schema and data migrations become separate. I planned to run schema migrations first using schema migration tools and only then data migrations using the same ORM approach. But it looks that it will bring us some troubles. Imagine if we don't deliver code on some environment for some time and there are a few data and schema migrations ready to run. Because I run schema migrations first - there are high chances that data migrations may fail because they could expect some schema that may have already been changed. So the order of schema and data migrations is very important.

Do you have any advice how do you run such things on your projects?


Solution

  • I wanted to sum up and post the answer that worked for me based on the answers and the data that I was able to find in the internet.

    First of all - yes, there is no possibility and even need to separate data and schema migrations. But there is a still possibility to benefit having a declarative database schema in your code which you could use to automatically generate schema migrations (alter table or create table statements).

    Specifically for me, I decided to keep using the ORM approach as a runner of all kind of migrations (because of the order issue mentioned above, therefore I choose one tool to run them all) and having a schema file in my application code which is being used by a separate tool (atlas as mentioned before) to calculate the diff between current schema and target schema and produce ALTER table statements as a migration whenever developer wants to make an update the schema. And as a last step, I have taught my ORM migration tool to run both types of migrations: produced bare sql schema migrations and data migrations that we already had. Hope this answer will help someone else.