Search code examples
c#sqlsql-server-2008bulkinsertinsert-into

Convert a file full of "INSERT INTO xxx VALUES" in to something Bulk Insert can parse


This is a followup to my first question "Porting “SQL” export to T-SQL".

I am working with a 3rd party program that I have no control over and I can not change. This program will export it's internal database in to a set of .sql each one with a format of:

INSERT INTO [ExampleDB] ( [IntField] , [VarcharField], [BinaryField])
VALUES
(1 , 'Some Text' , 0x123456),
(2 , 'B' , NULL),
--(SNIP, it does this for 1000 records)
(999, 'E' , null);
(1000 , 'F' , null);

INSERT INTO [ExampleDB] ( [IntField] ,  [VarcharField] , BinaryField)
VALUES
(1001 , 'asdg', null),
(1002 , 'asdf' , 0xdeadbeef),
(1003 , 'dfghdfhg' , null),
(1004 , 'sfdhsdhdshd' , null),
--(SNIP 1000 more lines)

This pattern continues till the .sql file has reached a file size set during the export, the export files are grouped by EXPORT_PATH\%Table_Name%\Export#.sql Where the # is a counter starting at 1.

Currently I have about 1.3GB data and I have it exporting in 1MB chunks (1407 files across 26 tables, All but 5 tables only have one file, the largest table has 207 files).

Right now I just have a simple C# program that reads each file in to ram then calls ExecuteNonQuery. The issue is I am averaging 60 sec/file which means it will take about 23 hrs for it to do the entire export.

I assume if I some how could format the files to be loaded with a BULK INSERT instead of a INSERT INTO it could go much faster. Is there any easy way to do this or do I have to write some kind of Find & Replace and keep my fingers crossed that it does not fail on some corner case and blow up my data.

Any other suggestions on how to speed up the insert into would also be appreciated.


UPDATE:

I ended up going with the parse and do a SqlBulkCopy method. It went from 1 file/min. to 1 file/sec.


Solution

  • Well, here is my "solution" for helping convert the data into a DataTable or otherwise (run it in LINQPad):

    var i = "(null, 1 , 'Some''\n Text' , 0x123.456)";
    var pat = @",?\s*(?:(?<n>null)|(?<w>[\w.]+)|'(?<s>.*)'(?!'))";
    Regex.Matches(i, pat,
          RegexOptions.IgnoreCase | RegexOptions.Singleline).Dump();
    

    The match should be run once per value group (e.g. (a,b,etc)). Parsing of the results (e.g. conversion) is left to the caller and I have not tested it [much]. I would recommend creating the correctly-typed DataTable first -- although it may be possible to pass everything "as a string" to the database? -- and then use the information in the columns to help with the extraction process (possibly using type converters). For the captures: n is null, w is word (e.g. number), s is string.

    Happy coding.