Search code examples
sql-serverxmlencodingfor-xml-path

How to properly convert accented characters using FOR XML


I'm trying to take a string in SQL Server that contains an accented character (the acute e specifically) and have it convert correctly when using FOR XML PATH as follows:

SELECT 'é' AS Accent
FOR XML PATH('')

/* Produces: <Accent>&eacute;</Accent> */

However, it always comes through with the accent, or tries to convert the other characters instead. This is what I've tried so far:

SELECT 'é' AS Accent
FOR XML PATH('')

/* Produces: <Accent>é</Accent> */

SELECT N'é' AS Accent
FOR XML PATH('')

/* Produces: <Accent>é</Accent> */

SELECT REPLACE('é', 'é', '&eacute;') AS Accent
FOR XML PATH('')

/* Produces: <Accent>&amp;eacute;</Accent> */

SELECT '<![CDATA[' + 'é' + ']]>' AS Accent
FOR XML PATH('')

/* Produces: <Accent>&lt;![CDATA[é]]&gt;</Accent> */

SELECT CAST('<![CDATA[' + 'é' + ']]>' AS XML) AS Accent
FOR XML PATH('')

/* Produces: <Accent>é</Accent> */

I've looked for quite a while and can't find anything apart from casting the end result XML into a string and manually replacing the character - I'm looking for a more correct way to do this. Is this just a bug in SQL Server? It seems to convert plenty of other characters (<, >, &, etc...).

Huge thanks in advance for any assistance.

I'm using SQL Server 2008.


Solution

  • I don't know for sure, but I'd assume, that this is not possible without a hack.

    SQL-Server treats XML as 2 byte utf-16 internally. So all characters, which are displayable within this range do not need escaping.

    It is a different thing, when it comes to unprintable characters or letters, which have a declarative meaning, like <, > and & or quotes (in some cases).

    I'd probably use a cast to NVARCHAR(MAX) and then a replace on string level at the point where you export this data. You won't be able to cast this back to XML:

    --Push your test XML into a variable

    DECLARE @xml XML=
    (
        SELECT 'é' AS Accent
        FOR XML PATH('')
    );
    SELECT @xml; --<Accent>é</Accent>
    

    --Cast it to string and do the replacement on string-level

    SELECT REPLACE(CAST(@xml AS NVARCHAR(MAX)),'é', '&eacute;') --<Accent>&eacute;</Accent>
    

    --Trying to cast this result back to XML fails

    SELECT CAST(REPLACE(CAST(@xml AS NVARCHAR(MAX)),'é', '&eacute;') AS XML); 
    

    XML parsing: line 1, character 16, well formed check: undeclared entity

    Obvioulsy SQL Server's XML engine doesn't even know this entity...