Search code examples
sql-serverstaging

Convert data from staging table to final table


I'm loading datafiles to a staging table which works perfectly fine, but after that I need to convert to several datatypes for the final table and the insert statement becomes not feasible for big tables. The statements are created by a tool of mine and I want to optimize this part. The insert statements look something like that, but with lot more columns, validity checks and some needed replacements inside these converts.

INSERT INTO foo
SELECT
convert(decimal(10,2),col1),
convert(date, col2),
convert(decimal(10,2),col3),
convert(int, col4),
convert(decimal(10,2),col5)
...
from foo_staging

On small tables this works pretty fast, but as the table becomes bigger the performance becomes really really bad. Any ideas how to improve the performance of the convert?

EDIT: I'm using Bulkinsert into the staging table and the files are really big.


Solution

  • A few quick thoughts, as there's no one Correct answer here:

    1. Load data into staging
    2. Clean/Convert by copying to a second staging table defined using the desired datatypes. Good data copied over, bad data left behind
    3. Copy data from the "clean" table to the "live" table

    For copying data from table to table, perhaps the fastest option is to use table partitioning. This can get complex, you'd need to read up on the concept and figure if it applies to your data or not. (Also, you'd need SQL Server Enterprise edition.)

    An alternative is to "chunkify" the data being copied, by breaking it into (say) sets of 10,000 rows, or some similarly arbitrary amount, and looping until you are done. This can be done using the TOP N syntax, figuring out date or key ranges based on the data being loaded, or perhaps even tricks with row_number().

    (Note again that availability of these features depends on the version/edition of SQL that you are using.)