Search code examples
.netsql-serversql-server-2008denormalizationsql-job

Most efficient way to move table rows from one table to another


I've created a denormalized table that needs to be inserted/updated every hour. The process is rather involved from a data perspective, so i am looking for a recommend way to update the table without disrupting the users.

i am thinking of having a separate table that my process inserts/updates to and once complete, need a way to push those changes to my live production table.

any help would be great!


Solution

  • Another solution is to use multiple schemas and play switch-a-roo. I only prefer this method because I used to do this trick in a job, and the warning message about renaming an object (which can't be suppressed) was filling up my history logs. Basically you need two additional schemas (one to hold a copy of the table temporarily, and one to hold the cached copy).

    CREATE SCHEMA cache AUTHORIZATION dbo;
    CREATE SCHEMA hold  AUTHORIZATION dbo;
    

    Now, create a mimic of the table in the cache schema:

    SELECT * INTO cache.table FROM dbo.table WHERE 1 = 0;
    -- then create any indexes etc.
    

    Now when it comes time to refresh the data:

    -- step 1:
    TRUNCATE TABLE cache.table;
    -- (if you need to maintain FKs you may need to delete)
    INSERT INTO cache.table SELECT ...
    
    -- step 2:
    -- this transaction will be almost instantaneous, 
    -- since it is a metadata operation only: 
    
    BEGIN TRANSACTION;
      ALTER SCHEMA hold  TRANSFER dbo.table;
      ALTER SCHEMA dbo   TRANSFER cache.table;
      ALTER SCHEMA cache TRANSFER hold.table;
    COMMIT TRANSACTION;
    

    Theoretically, you could move the last transfer out of transaction, because users could start to query the new copy of dbo.table after the second transfer, but like I said, this is almost instantaneous so I'd be surprised if you see any difference in concurrency.

    You could also optionally truncate cache.table again here, but I always kept it populated so I could compare data changes or troubleshoot if something went wrong. Depending on how long -- step 1 takes, it may be faster to perform the transfers in reverse than to re-populate from scratch.

    Like rename, you can get wonky things from this process, such as statistics getting lost as they move with the actual table, they don't stick with the name. And like rename, you'll want to test this out and you may want to play around with isolation levels, e.g. RCSI for accessing the reporting table.