Search code examples
sql-serverunicodeazure-sql-databasebulkinsertbcp

Insert rows with Unicode characters using BCP


I'm using BCP to bulk upload data from a CSV file to SQL Azure (because BULK INSERT is not supported). This command runs and uploads the rows:

bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -c -U bcpuser@resource -S tcp:resource.database.windows.net

But data.csv is UTF8 encoded and contains non-ASCII strings. These get corrupted. I've tried changing the -c option to -w:

bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -w -U bcpuser@resource -S tcp:resource.database.windows.net

But then I get '0 rows copied'.

What am I doing wrong and how do I bulk insert Unicode characters using BCP?


Solution

  • But data.csv is UTF8 encoded

    The UTF-8 encoding is the primary issue. Using -w won't help because in Microsoft-land, the term "Unicode" nearly always refers to UTF-16 Little Endian.

    The solution will depend on which version of BCP you are using as an option was added in the newest version (13.0 / 2016):

    • If you are using BCP that came with SQL Server prior to SQL Server 2016 (version 13.0) then you need to convert the csv file to UTF-16 Little Endian (LE) as that is what Windows / SQL Server / .NET use for all strings. And use the -w switch.

      I got this to work encoding a file as "UCS-2 LE BOM" in Notepad++, whereas that same import file failed using the -c switch.

    • If you are using BCP that came with SQL Server 2016 (version 13.0) or newer, then you can simply add -c -C 65001 to the command line. -C is for "code page", and 65001 is the code page for UTF-8.

    The MSDN page for bcp Utility states (in the explanation of the -C switch):

    Versions prior to version 13 (SQL Server 2016) do not support code page 65001 (UTF-8 encoding). Versions beginning with 13 can import UTF-8 encoding to earlier versions of SQL Server.

    UPDATE

    Support for UTF-8 / code page 65001 was added to SQL Server 2014 via SP2, as noted in this Microsoft KB article:

    UTF-8 encoding support for the BCP utility and BULK INSERT Transact-SQL command in SQL Server 2014 SP2