Environment: VB.NET, VS 2022 (latest build), SQL Server 2019 (latest patch)
I'm attempting to execute a simple INSERT of varchar
data containing certain UTF-8 data, and regularly getting the SqlException "Invalid data for UTF8-encoded characters" when a single character is surrounded by left or right curly quotes (e.g. [left double quote]J[right double quote] -- like "J" with curly double quotes around it).
The error also occurs with curly single quotes, but DOES NOT occur with similar data not consisting of single characters not bounded in such a way (e.g. the contraction "isn[apostrophe]t" -- "isn't" with a curly apostrophe).
My code is similar to:
Dim SQL As String = "INSERT INTO MyTable ([MyField]) VALUES (@MyField)"
Dim curName As String
Using cmd As New SqlCommand(SQL, conn)
cmd.Parameters.Add("@MyField", SqlDbType.VarChar, 80)
While ...
...loop through filereader taking in UTF-8 encoded text data
curName = Await oRead.ReadLineAsync()
cmd.Parameters("@MyField").Value = curName
Try
Await cmd.ExecuteNonQueryAsync()
Catch ex As Exception
Debug.Print("Error: " & ex.Message) ' <--- Shows error here
End Try
End While
End Using
The column is set to varchar
, with the database collation set to Latin1_General_100_CI_AS_SC_UTF8.
Any idea what's going, or if a workaround is possible?
I don’t fully understand the underlying issue — my best guess is that System.Data.SqlClient is old and simply doesn’t support UTF-8 collations — but I found a couple different workarounds:
Declare your SqlParameter as NVarChar
instead of VarChar
:
cmd.Parameters.Add("@MyField", SqlDbType.NVarChar, 80)
Now the string will be successfully passed to SQL Server as UCS-2/UTF-16. SQL Server will then automatically convert the string to UTF-8 when it inserts the string into the table.
Upgrade from System.Data.SqlClient to the Microsoft.Data.SqlClient NuGet package. (The Release Notes mention UTF-8 support.) If you go with this option, you must import SQL-related classes like SqlConnection and SqlCommand from the Microsoft.Data.SqlClient
namespace instead of System.Data.SqlClient
:
'Imports System.Data.SqlClient
Imports Microsoft.Data.SqlClient
In the .NET Framework, the internal TdsParser class is responsible for sending queries to SQL Server in binary format. The class contains the following lines of code (2410-2411):
_defaultCodePage = Int32.Parse(stringCodePage, NumberStyles.Integer, CultureInfo.InvariantCulture);
_defaultEncoding = System.Text.Encoding.GetEncoding(_defaultCodePage);
This suggests that for all Latin1_General collations, TdsParser encodes strings using code page 1252, which is wrong for UTF-8 collations and produces garbage bytes.