I have table written to by an application. The field is varchar(max). The data looks like xml.
DECLARE @poit VARCHAR(100)
SET @poit = '<?xml version="1.0" encoding="utf-8"?><test>VÍA</test>'
SELECT CONVERT(XML,@poit)
But (seemingly because of the UTF8; removing it works), I get this error:
XML parsing: line 1, character 46, illegal xml character
Is there a way to cleanly convert it?
I found this thread, which talks about varchar not supporting "non-ASCII characters", though obviously the I is non-unicode. Yes, I can do this:
SELECT CONVERT(XML, REPLACE(@poit, 'encoding="utf-8"', ''))
But is that the best way?
Why does casting a UTF-8 VARCHAR column to XML require converting to NVARCHAR and encoding change?
I would try changing the datatype of your @poit
variable from VARCHAR(100)
to NVARCHAR(100)
. Then replace the utf-8 encoding with utf-16 so your code would look something like:
DECLARE @poit NVARCHAR(100)
SET @poit = '<?xml version="1.0" encoding="utf-8"?><test>VÍA</test>'
SELECT CONVERT(XML,REPLACE(@poit, 'utf-8', 'utf-16'))
As long as you're not calling the conversion with the replace in it in a SELECT that returns oodles of results, the performance should be just fine and it will get the job done.
Reference: http://xml.silmaril.ie/characters.html <- scroll down and you'll see some info as to the difference between utf-8 & utf-16. Hope this helps!