Search code examples
sql-serverssissql-server-2008-r2dts

Merge csv file with db table


I have to load daily a lot of data out of files into a database table. The data is out of an ERP-System and has 5,000,000 records and is delivered in csv files up to 500mb. There is already a integration service job which is responsible for this and a transformation, data check and import into another system. I forgot to say there is no single id column, else it would be easy. (3 to 7 id columns) It is also not possible that the ERP-system only provides the data which has changed since the last execution.

The problem is to improve the merge of the csv files into data table. The table has to hold all the time the whole data of the previous execution.

Through bad performance I have to renew this import. Following solution have been validated:

  • TSQL-Merge: Really bad performance, import will need too long
  • Custom DTS Task: I need to know which record is new, updated or deleted. Only loading the db table and the csv threw an OutOfMemoryException
  • Compare the file outside integration services with the previous file and import only the delta. It is a nice solution but in reality we have a lot of errors because the file is different with the previous load.

Now the question, does anybody have experience with such requirements? Does anybody have a good suggestion to solve this problem?

Thanks in advice!

MH


Solution

  • I haven't started the implementation yet, but after I talked to a Integration Services specialist, it is the best solution to solve this by a temp table and a look up. First you need to create the temp table with a TSQL-Task and the you insert the records of the file with the usual copy task. In the look up you decide if it is an insert or an update.

    In the next two weeks I will work on the implementation, so don't hesitate to ask me about further details.