Search code examples
sql-serverazure-sql-databasedatabase-administration

Strategies to modify huge database


I am testing different strategies for a incoming breaking change. The problem is that each experiment would carry some costs in Azure. The data is huge, and can have some inconsistencies due to many years with fixes and transactions before I even knew the company.

I need to change a column in a table with million of records and dozens of indexes. This will have a big downtime.

ALTER TABLE X ALTER COLUMN A1 decimal(15, 4) --The original column is int

One of the initial ideas (Now I know this is not possible) is to have a secondary replica, do the changes there, and, when changes finish, swap primary with secondary... zero or almost zero downtime. I am referring to a "live", redundant replica, not just a "copy"

EDIT: Throwing new ideas:

  1. Variations to what have been mentioned in one of the answers: Create a table replica (not the whole DB, just the table), apply a INSERT INTO... SELECT and swap the tables at the end of the process. Or... do the swap early to minimize downtime in trade of a delay during the post-addition of all records from the source

I have tried this, but takes AGES to complete. Also, some null and FK violations make the process to fail after processing for several hours. "Resuming" could be an option but it makes the process slower with each execution. Without some kind of "Resume", each failure have to be repeated from scratch

An acceptable improvement could be to IGNORE the errors (but create logs, of course) and apply fixes after migration. But afaik, AzureSql (nor SqlServer) doesn't offer an "ignore" option

  1. Drop all indexes, constraints and dependencies to the column that needs to be modified, modify the column and apply all indexes, constraints and dependencies again.

Also tried this one. Some indexes take AGES to complete. But for now seems to be the best bet. There is a possible variation by applying ROW COMPRESSION before the datatype change, but I think it will not improve the real deal: index re-creation

  1. Create a new column with the target datatype, copy the data from the source column, drop the old column and rename the new one. This strategy also requires to drop and regenerate indexes, so it will not offer lot of gain (if any) with regards #2.

A friend thought of a variation on this, which is to duplicate the needed indexes ONLINE for the column copy. In the meanwhile, trigger all changes on source column to the column copy.

For any of the mentioned strategies, some gain can be obtained by increasing the processing power. But, anyway, we consider to increase the power with any of the approaches, therefore this is common for all solutions


Solution

  • After many tests and backups, we finally used the following aproach:

    1. Create a new column [columnName_NEW] with the desired format change. Allow NULLS
    2. Create a trigger for INSERTS to update the new column with the value in the column to be replaced
    3. Copy the old column value to the new column by batches This operation is very time consuming. We ran a batch every day in a maintenance window (2h during 4 days). Our batch filled the values taking oldest rows first, we counted on the trigger filling the new ones
    4. Once #3 is complete, don't allow NULLS anymore on the new column, but set a default value to avoid the INSERT trigger to crash
    5. Create all the needed indexes and views on the new column. This is very time consuming but can be done ONLINE
    6. Allow NULLS on the old column
    7. Remove the insert trigger - start downtime now!
    8. Rename the old column to [columnName_OLD], the new to [columnName]. This requires few downtime seconds!

    --> You can consider it is finally done!

    1. After some safe time, you can backup the result and remove [columnName_OLD] with all of its dependencies

    I selected the other answer, because I think it could be also useful in most situations. This one has more steps but has a very little downtime and is reversible at any step but the last.