Search code examples
sqlruby-on-railsrubypostgresqlactiverecord

Preventing ActiveRecord migration from making huge changes in db/structure.sql


When running the ActiveRecord migration below on the test (not production) database, large changes in db/structure.sql appear. This is reproducible. The changes are:

  • A small number of the expected changes based on the migration code (as desired).
  • A large number of unexpected changes, which are unrelated to the migration code (not desired). All these unexpected changes rearrange the order of the tables, views and materialized views that are not mentioned in the SQL code in the migration. These rearranged tables, views and materialized views are not associated (e.g., by dependencies) with the items in the SQL code in the migration.

What causes these large unrelated differences, and, most importantly, how to prevent these large unrelated differences?

Could they be related to the mismatch between the postgresql versions on the production database server and on the machine where the migrations are executed on the test database?

These unrelated changes make git diff output much less useful. Same goes for diff results on GitHub.

In this tiny minimal example below, just a few tables are rearranged. But in the real life (larger) migration it changes the order of dozens of tables, views and matviews - a huge change.

class FixFooBarBaz < ActiveRecord::Migration[5.2]

  def up
    sql = <<~SQL
                  DROP VIEW master_foo;
                  DROP VIEW IF EXISTS bar1;
                  CREATE OR REPLACE VIEW bar1 AS
                  SELECT * FROM baz1
                  UNION ALL
                  SELECT * FROM baz2;
                  CREATE OR REPLACE VIEW master_foo AS
                  SELECT
                      *,
                      'bar1' AS bar_type
                  FROM
                      bar1
                  UNION ALL
                  ...
    SQL

    execute sql
  end

  def down
    sql = <<~SQL
                  DROP VIEW IF EXISTS master_foo;
                  DROP VIEW IF EXISTS bar1;
                  ...
    SQL

    execute sql
  end
end 

Hardware/software used:

Machine where migrations are executed:

MacBookPro, Apple M1 Max, macOS Sonoma 14.2.1
ruby 2.7.3p183 (2021-04-05 revision 6847ee089d) [arm64-darwin21]
15.3 psql (PostgreSQL) 15.3 (Homebrew)

Production database server:

psql (14.9 (Homebrew), server 13.3 (Ubuntu 13.3-1.pgdg20.04+1))

Solution

  • No, this is not related to different DB versions, it depends more on the order that different users run the migrations. If you have all your team blow away their DBs, and reload them from a common dump, then they'll all produce the same output... until someone runs a migration in a different order (which obviously will happen as soon as more than one user creates a new migration).

    Unfortunately, there's no simple/nice way to resolve this.

    Update replying to comments from OP:

    each running somewhat different postgresql versions on their laptops

    Right, as I said, in my experience it has nothing to do with the PG version (maybe big version differences makes a difference, I never have big differences in my teams so can't comment).

    many years, many hundreds of migrations, and across a small team of developers [...] The changes in db/structure.sql were always consistent with the corresponding migrations

    I've had the same experience, and then once things start changing/conflicting, it seems they never go back to the way they were. I've spent a lot of time (like a week of dedicated time) looking into it and never found any heuristic for why things started, nor how to return to the way things were. I have seen situations where only one member of a team experiences the "self-created migration triggers it in own structure", ie. other members can still create a migration, and the structure/schema is updated sensibly. But by this time it's often multiple members all experiencing the same thing, and obviously once one member experiences it then others will start seeing the conflicts/big changes when they pull from the shared repo.

    Seems there's something internal in PG where after a certain level of schema complexity / db object size, it has to start storing new objects in some way that when they change the dump changes significantly. I've never seen it in a small DB.

    Another Update

    I was reminded of this gem https://github.com/jakeonrails/fix-db-schema-conflicts - I believe it requires you switch to using schema.rb, and when I tried it there were still some things it wasn't helping with, but they were quite specific, like triggers.