Search code examples
kubernetesjenkinsliquibaseflywaycicd

Liquibase update when pod goes down


I am deploying liquibase scripts using CI CD pipeline. There are some instances where a liquibase changeset transaction might take very long and the pod may go down during that time. For example, a changeset adds a new non-null column into an existing table which already has a million records. A default value will be added to the existing rows of table. I would like to know what happens when the pod goes down after few rows are updated with default value.


Solution

  • The answer somewhat depends on your database and were it actually is killed. What liquibase tries to do is:

    First, update the databasechangeloglock table as "locked" and commit it

    Then, for each changeset

    1. Start a transaction
    2. Execute each change in the changeSet, execute the SQL statement(s) required
    3. Mark the changeset as ran in the databasechangelog table
    4. Commit the transaction

    Finally, update the databasechangeloglock table as "unlocked" and commit it.

    If the pod is killed randomly in that process, the impact will depend on exactly where it was killed and what is going on.

    Percentage of time is in #2 above, so that's likely where it is killed. Because we try to run in a transaction, when the connection is cut the database should automatically roll back the transaction. BUT: some statements are auto-committing and can mess that up and leave things partly done.

    If you have a changeset that is just doing an update of all the rows and the pod is killed during that, most databases can just roll back that update and none of the rows will be updated and next time liquibase runs it knows the changeset has not been ran and it will retry the update.

    If you have a changeset that adds a column AND updates the row and it is killed during the update, most databases will have committed the "add column" so the rollback will only undo the update of values. And since the changeset is not marked as ran, the next update will try to run it again and will fail with a "column already exists" exception.

    For that reason, it's best to have a single change per changeSet unless they can all be ran in a single transaction.

    If it fails anywhere else in that process, it's still the same "database will roll back the current transaction, so it depends on what happens to be in the current transaction".

    Regardless of where it fails in the changeSet, you'll also have an issue with the "unlock the databasechangeloglock table" step not being ran. The next liquibase run will block until it's unlocked. For managed CICD systems, the infrastructure can do a better job of "make sure just one version of liquibase is running" than liquibase does with the databasechangeloglock table, so you can add a "liquibase unlock" as the first step of your pod to be safe.