Search code examples
sqlsql-servercsvssmsbulkinsert

How to import a CSV with a TIME column on SSMS (SQL)


I have a CSV with the following format

  COL1     |   COL2
 20211002  |  163136
(YYYYMMDD) | (HHMMSS)

On SQL I am creating a table with the following format

CREATE TABLE dbo.[table] (
    [COL1] DATE,
    [COL2] TIME,
);

BULK LOAD:

BULK INSERT dbo.[table]
FROM 'path\df_2.csv'
WITH (FIRSTROW=2, MAXERRORS=0, FIELDTERMINATOR=',' , ROWTERMINATOR='\n');

The COL1 is working perfectly and being read, however COL02 is giving me trouble and giving me the following error

Msg 4864, Level 16, State 1, Line 17
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 13 (COL2).

How can I fix this?


Solution

  • The problem is that your COL2 data does not adhere to the t-sql TIME format of hh:mm:ss. If you are unable to redo your source file to get that column into the proper format you can transform the data during load with OPENROWSET, instead of using BULK LOAD.

    First create a format file like this (name it fmt.xml or whatever):

    <?xml version="1.0"?>
    <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
     <RECORD>
      <FIELD ID="1" xsi:type="CharTerm" TERMINATOR=","/>
      <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\n"/>
     </RECORD>
     <ROW>
      <COLUMN SOURCE="1" NAME="COL1" xsi:type="SQLDATE"/>
      <COLUMN SOURCE="2" NAME="COL2" xsi:type="SQLVARYCHAR"/>
     </ROW>
    </BCPFORMAT>
    

    Then use this to bulk load your data:

    INSERT dbo.[table] (COL1, COL2)
    SELECT
        COL1,
        CONCAT(LEFT(COL2,2), ':', SUBSTRING(COL2, 3, 2),':', SUBSTRING(COL2, 5, 2)) COL2
    FROM OPENROWSET
    (
        BULK 'path\df_2.csv',
        CODEPAGE = 'RAW',
        FIRSTROW = 2,
        FORMATFILE = 'path\fmt.xml' --the path to your format file from above
    ) AS csv;
    

    The trick here is that OPENROWSET reads your data for COL2 into memory as a VARCHAR during the bulk load, as specified in the format file. Then you can manipulate it using plain t-sql, converting it into a compatible format for TIME as you insert the data from memory into your table.

    Assuming your COL2 data is 6 characters in all situations that should do it for you. If your data is NOT 6 characters in places you'll need to account for that in the SELECT;