I've found nothing on Google or SO that quites lines up with my issue.
In SQL Server, I have a scalar function (we'll call it dbo.MySqlStringFunction
).
What this function does is call a utility written in C# that calls an ASP.Net view and returns the HTML as a SqlString.
The function definition in SQL Server is:
RETURNS [nvarchar](max) WITH EXECUTE AS CALLER
AS EXTERNAL NAME [Utils.UserDefinedFunctions].[MySqlStringFunction]
The C# code simplified is:
var request = (HttpWebRequest)WebRequest.Create("www.mydomain.com");
using (var response = (HttpWebResponse)request.GetResponse())
using (var stream = response.GetResponseStream())
{
using (var streamReader = new StreamReader(stream, Encoding.UTF8)
{
return new SqlString(streamReader.ReadToEnd());
}
}
When I put the C# code into a console app and run it, I get everything exactly as it should be.
When I access the URL directly in my browser, it displays exactly as it should be.
When I do SELECT MySqlStringFunction()
however, characters such as ™, §, ¤ display as 2 or 3 question marks each.
It appears that it is somewhere between the return new SqlString(..)
and the sql function returning the value that something is going wonky. But I'm at a loss as to what it could be.
It seems that the issue was the location of the return
. The current code (shown in the Question), is returning in the middle of 3 using
blocks, one of which is the UTF-8 stream being read. This probably confused things as SQLCLR is isolated memory from the main SQL Server memory, and usually you can't return via a stream. It is best to close the open stream first and let the using
blocks call Dispose()
. Hence:
using
(i.e. string _TempReturn = String.Empty;
)using
, replace return
with: _TempReturn = streamReader.ReadToEnd();
using
closing bracket, add: return new SqlString(_TempReturn);
(old answer, will remove in the near future)
The problem is with the encoding difference between the web page and SQL Server. You are using Encoding.UTF8
for the web page (which is quite likely correct given that UTF-8 is the most common encoding for the interwebs), but SQL Server (along with .NET and Windows in general) is UTF-16 Little Endian. This is why you are getting 2 or 3 ?
s for each character above Code Point 127: UTF-8 is a multi-byte encoding that uses 1, 2, or 3 bytes per character, whereas UTF-16 is always 2-bytes (well, supplementary characters are 4 bytes, but that is due to being a pair of double-byte values).
You need to convert the encoding to UTF-16 Little Endian before, or as, you pass back the stream. And, UTF-16 Little Endian is the Unicode
encoding in .NET, while Big Endian Unicode
refers to "UTF-16 Big Endian". So you want to convert to the Unicode
encoding.
OR, it could be the reverse: that the web page is NOT UTF-8, in which case you have declared it incorrectly in the StreamReader
. If this is true, then you need to specify the correct encoding in the StreamReader
constructor.