Search code examples
sql-serverindexinglarge-datacalculated-columns

Adding computed persisted column to large table


I need to add computed persisted column to a large table (~1B rows). Can I do this somehow in batches or by using existing precalculated column?

I first tried with just simple column:

ALTER TABLE [dbo].[T] ADD [X] [decimal](32, 6) NULL
GO
UPDATE [dbo].[T]
SET [X] = [Y] / [Z]

After about 14 hours this completely filled 2 disks with transaction log and failed. So I made this update in a cycle in batches - all done within 7 hours and no blocking to user queries.

Now I'd need this column to be automatically maintained for new records - hence considering persisted computed column. I want the downtime of the table to be as small as possible (ideally none). Looking at the experience with the simple update and batched update I'd like to somehow do this in batches or use the existing column (that holds the result of calculation) - is there any way how to achieve this?

I need persisted column as I need to index it afterwards and also because I'm optimizing query where compute scalar (of the exact expression I'm going to persist) takes considerable amount of time. I was also considering indexed view, but I'm worried that same issue with long running transaction can occur.

Sql Server 2016 (Enterprise). Simple recovery mode.

EDIT: For my future reference (and if anybody finds this helpful) - here's options that I considered (and tested):

  • Simple alter:

ALTER TABLE [dbo].[T] ADD [X] AZ [Y] / [Z] PERSISTED

Pros: simple, integrity ensured

Cons: single transaction - huge transaction log requirements and if failed half way - all progress is lost; cannot be done online - any queries against the table are locked

  • Indexed view

    CREATE VIEW [dbo].[T_view] --indexed view WITH SCHEMABINDING AS SELECT [Y], [Z], [Y] / (NULLIF(Z, 0)) AS [Z] FROM [dbo].[T] GO

    --first index of materialized view must be unique and clustered CREATE UNIQUE CLUSTERED INDEX IDX_T_view
    ON [dbo].[T_view] ([Z]) GO

Pros: The underlying table does not get fragmented when adding columns

Cons: Mainly the need for uniqueness of the index. Plus also a single transaction

  • Non-persisted calculated column with index:

    ALTER TABLE [dbo].[T] ADD [X] AZ [Y] / [Z]

    CREATE NONCLUSTERED INDEX [IX_T] ON [dbo].[T] ( [X] ASC )

Pros: Quick, no table fragmentation (as no physical changes performed)

Cons: The computed column still needs to be computed each time it's been selected

  • Insert/update trigger:

Pros: we can first update data in batches and then keep DB to take care of updating newly inserted rows. Column can be used in WHERE clause of filtered index

Cons: ensuring integrity is up on us (there can be some inserts/ updates in the menatime when we are updating table in batches)

  • Create new table and iteratively move data:

Pros: Same as with trigger. Computed column is generally more performant than trigger; We can plan the data move in a way so that we get nicely defragmented table

Cons: same as with trigger. Plus we need extra space.

EDIT2: After 2 days of moving the data building indexes I found out that computed column (even though persisted) cannot be used in where clause of filtered index. Even after I moved it from filter expression to include columns (so that SQL can still perform selection just based on this index), the performance degraded extremely. So I needed to convert to insert trigger solution.


Solution

  • One thing you might consider is creating a new table with the computed persisted column in the definition. Then you could populate this new table in batches from the existing table. This would minimize the downtime and blocking. Similar to the batching process you already did but in the end you would have a second copy of the data. Once it completes you would then drop the original table and rename the new one. You might want to consider adding the index from the beginning.