I'm loading datafiles to a staging table which works perfectly fine, but after that I need to convert to several datatypes for the final table and the insert statement becomes not feasible for big tables. The statements are created by a tool of mine and I want to optimize this part. The insert statements look something like that, but with lot more columns, validity checks and some needed replacements inside these converts.
INSERT INTO foo
SELECT
convert(decimal(10,2),col1),
convert(date, col2),
convert(decimal(10,2),col3),
convert(int, col4),
convert(decimal(10,2),col5)
...
from foo_staging
On small tables this works pretty fast, but as the table becomes bigger the performance becomes really really bad. Any ideas how to improve the performance of the convert?
EDIT: I'm using Bulkinsert into the staging table and the files are really big.
A few quick thoughts, as there's no one Correct answer here:
For copying data from table to table, perhaps the fastest option is to use table partitioning. This can get complex, you'd need to read up on the concept and figure if it applies to your data or not. (Also, you'd need SQL Server Enterprise edition.)
An alternative is to "chunkify" the data being copied, by breaking it into (say) sets of 10,000 rows, or some similarly arbitrary amount, and looping until you are done. This can be done using the TOP N
syntax, figuring out date or key ranges based on the data being loaded, or perhaps even tricks with row_number()
.
(Note again that availability of these features depends on the version/edition of SQL that you are using.)