I have a 55Gb Fact table where I have to delete some records which later on can be reverted back. Number of deleted records vary between 10 to 100 thousand.
Currently my delete strategy is based on this: I update the dateKey for the records to be deleted e.g. from positive int 20080122 to negative int -20080122 so that current date filters don't include it.
My thinking here is that instead of moving data out and back in the fact table I make the date out of filter date range and then move it back into filterable date range by use of updates on dateKey.
I would like to hear your views on this delete strategy especially around NCI (non clustered index) behavior. Do you think updating the indexed dateKey is better than moving actual data?
Rather than re-purpose the dateKey
column, our standard practice is to add a "soft delete" column to the table, either an "is_deleted" bit column or a "deleted_on" datetime column and use that column to filter out "deleted" rows.
This requires more work on your part as all of your existing queries will have to be modified to use this new column, but now your database doesn't have to do the work of re-indexing or deleting/inserting actual data.