Search code examples
sql-servercsvssmsbulkinsert

SQL BULK INSERT Format Issues


I'm using SQL Server and Management Studio.

I was getting data conversion errors when performing a BULK INSERT into a table, that I could not resolve and switching to using a format file allowed data to be inserted, but the data inserted was wrong.

I initially had ~2000 rows, but reduced down and can reproduce the same issue with a single row of data.

Here are the database table declarations:

-- Create the Signals table
CREATE TABLE Signals 
(
    ID INT IDENTITY(1,1) PRIMARY KEY,
    SignalName NVARCHAR(255) NOT NULL,
    SignalUnit NVARCHAR(50)
);
GO

-- Create the Series table
CREATE TABLE Series 
(
    ID INT IDENTITY(1,1) PRIMARY KEY,
    SeriesType NVARCHAR(255) NOT NULL,
    ReferenceSignalID INT,
    FOREIGN KEY (ReferenceSignalID) REFERENCES Signals(ID)
);
GO

-- Create the SeriesData table
CREATE TABLE SeriesData 
(
    ID INT IDENTITY(1,1) PRIMARY KEY,
    SeriesID INT,
    SignalID INT,
    SignalValue FLOAT NULL, -- I queried the precision of this and it is "53"
    ReferenceSignalValueID INT NULL,
    FOREIGN KEY (SeriesID) REFERENCES Series(ID),
    FOREIGN KEY (SignalID) REFERENCES Signals(ID),
    FOREIGN KEY (ReferenceSignalValueID) REFERENCES SeriesData(ID)
);
GO

Example CSV content (reduced from ~2000 rows as example):

29,953,0.0,
29,953,0.01000213623046875,
29,953,0.0200042724609375,

There are no headers, but columns represent SeriesID | SignalID | SignalValue | ReferenceSignalValueID (all blank in this example - to be NULL in the database)

Line ending characters are "CR""LF" - encoding is UTF-8.

I scanned down and cannot see any rogue/hidden/invalid characters, all formats seem to match the target.

SQL code doing the BULK INSERT:

BULK INSERT SeriesData
FROM 'myPath\myData.csv'
WITH (
    --FIELDTERMINATOR = ',',
    --ROWTERMINATOR = '\r\n', -- also "\n" has an issue, but more errors
    FORMATFILE = 'myPath\SeriesData.fmt',
    CODEPAGE = '65001' -- obtained this from AI chats for UTF-8 encoding
    --KEEPNULLS -- no change using this
)

The .fmt file I have contains:

10.0
4
1       SQLINT              0       4       ","     1     SeriesID               ""
2       SQLINT              0       4       ","     2     SignalID               ""
3       SQLFLT8             0       8       ","     3     SignalValue            ""
4       SQLINT              0       4       "\r\n"  4     ReferenceSignalValueID ""

(received this from an AI chat)

The result in the SeriesData table is like this:

ID      SeriesID    SignalID    ReferenceSignalValueID
----------------------------------------------------------
6166    3355961         0   NULL    NULL
6167    3355961         0   NULL    NULL
6168    3355961         0   NULL    NULL

For every row of the CSV the SeriesID is always 3355961 instead of 29, the SignalID is always 0 instead of 953, and the SignalID is always NULL instead of the actual float value.

The AI chat I was using to debug and learn about how to do this suggested the following .fmt content (but using this resulted in the same type mismatch error):

10.0
4
1       SQLCHAR             0       12      ","     1     SeriesID               ""
2       SQLCHAR             0       12      ","     2     SignalID               ""
3       SQLCHAR             0       20      ","     3     SignalValue            ""
4       SQLCHAR             0       12      "\r\n"  4     ReferenceSignalValueID ""

Does anyone know how I can get SQL to interpret the values correctly, or is there something else fundamentally wrong?

For further info, previous attempts were using FIELDTERMINATOR and ROWTERMINATOR (as commented-out above) and in those cases I was getting an error:

Msg 4864, Level 16, State 1, Line 79
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 3 (SignalID).

This is concerning seeing as SignalID should be column 2, not 3 - does this suggest something wrong with the CSV or SQL reading of it and the .fmt file approach is not sufficient/distracting from the "real" issue?


Solution

  • Please try the following solution that will work starting from SQL Server 2017 onwards.

    Notable points:

    • No format file is needed.
    • Please pay attention to a db view dbo.vSeriesData to accommodate presence of IDENTITY column. The BULK INSERT loads data into the target table through the view.

    e:\Temp\myData.csv file

    29,953,0.0,
    29,953,0.01000213623046875,
    29,953,0.0200042724609375,
    

    SQL

    USE tempdb;
    GO
    
    DROP VIEW IF EXISTS dbo.vSeriesData;
    DROP TABLE IF EXISTS dbo.SeriesData;
    GO
    
    CREATE TABLE dbo.SeriesData (
        ID INT IDENTITY(1,1) PRIMARY KEY,
        SeriesID INT,
        SignalID INT,
        SignalValue FLOAT NULL,
        ReferenceSignalValueID INT NULL
    );
    GO
    
    CREATE VIEW dbo.vSeriesData
    AS 
    SELECT SeriesID,
        SignalID,
        SignalValue,
        ReferenceSignalValueID
    FROM dbo.SeriesData;
    GO
    
    BULK INSERT dbo.vSeriesData
    FROM N'e:\Temp\myData.csv'
    WITH (
       FORMAT='CSV' -- starting from SQL Server 2017 onwards
       , FIRSTROW = 1
       , FIELDTERMINATOR = ','
       , ROWTERMINATOR = '\n');
    
    -- test
    SELECT * FROM dbo.SeriesData;
    

    Output

    ID SeriesID SignalID SignalValue ReferenceSignalValueID
    1 29 953 0 NULL
    2 29 953 0.0100021362304688 NULL
    3 29 953 0.0200042724609375 NULL