I am having a problem with using bulk insert. The issue is that the source files (tab delimited) that I'm dealing with contain rows that end in cr/lf
without filling in values of the empty columns with tab for the rest of the row. So when the data is pulled into SQL Server, it's combining those shortened lines into the previous line. so basically it's combining multiple rows into one rather than writing it as two separate rows with nulls at the end of the first row.
Example to illustrate the problem: sample .txt file
column1 column2 column3 column4 column5
1 2 3 4 5
2 5 4 6
4 4 6 4
4 5 6 4 6
SQL to create table and bulk insert
CREATE TABLE test (
[column1] varchar(MAX) NULL,
[column2] varchar(MAX) NULL,
[column3] varchar(MAX) NULL,
[column4] varchar(MAX) NULL,
[column5] varchar(MAX) NULL
)
BULK INSERT test
FROM 'c:\temp\testimport.txt'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\r'
);
The really strange thing is that I can use the data import wizard and it imports the data perfectly, without any issue, and handles the lack of tabs for the columns just fine. But I don't know what the wizard is doing behind the scenes to make this happen. I would love to have the code it uses to create the table and do the insert as that would probably answer my question for me. At the end of the day I can't use the wizard as this will eventually be part of an automated task I'll be running against an SQL Server Express database on multiple files with different names but the same column header.
Maybe bulk insert isn't the way to go here? Or there is something obvious I'm missing that someone else might know off the top of their head. Either way, all help is appreciated and thanks in advance.
As Tim H suggested I've made a few attempts at creating a format file to accommodate the data. Results so far are as follows.
Using
bcp temp.dbo.test format nul -x -f test_format.xml -n -T
produces
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="2" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="5" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="column1" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="2" NAME="column2" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="3" NAME="column3" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="4" NAME="column4" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="5" NAME="column5" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
Using this temp file as is produces......
Msg 4866, Level 16, State 7, Line 31 The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly.
My attempt to edit the XML to work.....
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="column1" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="2" NAME="column2" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="3" NAME="column3" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="4" NAME="column4" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="5" NAME="column5" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
Which does insert the data, but unfortunately still produces the same jumbled insert with overlapping lines in the same row.
Never found a direct solution from SQL express for this. I ended up going with PowerShell scripting to solve the problem. Import-CSV pulled the data from the files uniformly and without issue. Not sure why, but it handled the data far better than SQL did. From there I used variables for each line and Invoke-SQLCmd and some SQL scripting to import them into the DB. Worked like a charm. Since this process is all on the local server there aren't any security issues to worry about, so it was an acceptable solution. Thanks again for all the suggestions and help though.