Search code examples
sql-servert-sqlfor-xml-path

FOR XML PATH doesn't work with 0x001E character


I've checked : https://www.red-gate.com/simple-talk/sql/t-sql-programming/concatenating-row-values-in-transact-sql/ So, my query is:

  SELECT DISTINCT ID, NAME, DOCTEXT2
  FROM DOC
  CROSS APPLY (SELECT 
     Stuff((SELECT ' ' + RTRIM(LTRIM(DOCTEXT))  
        FROM DOC d 
        WHERE d.ID=DOC.ID AND d.NAME = DOC.NAME 
        FOR XML PATH (''), TYPE).value('.','varchar(max)'),1,1,'')
  ) D (DOCTEXT2)

The error is :

FOR XML could not serialize the data for node 'NoName' because it contains a character (0x001E) which is not allowed in XML. To retrieve this data using FOR XML, convert it to binary, varbinary or image data type and use the BINARY BASE64 directive.

I know I've 0x001E characters in the data. I don't want to replace this data in the database.

My data are :

 ID NAME    DOCTEXT 
 12 AB      ERROR INSTRUCTIONS                                        
 12 CC      CRN   70                             SS 
 12 CC      DRF 77                                 
 12 CC

What I'm trying to have is, something like:

 ID NAME    DOCTEXT 
 12 AB      ERROR INSTRUCTIONS                             
 12 CC      CRN   70 SS DRF 77

The query is working if the data doesn't have 0x001E character(s).

EDIT :

I tried: CAST ( REPLACE( DOCTEXT, char(0), '') AS VARCHAR) instead of RTRIM(LTRIM(DOCTEXT)), without success.


Solution

  • It's clunky, but you can convert back and forth between VARBINARY(MAX) and NVARCHAR(MAX) in hexstrings to avoid any issue with characters XML doesn't like in text:

    ;WITH D1 AS (
      SELECT ID, NAME
      FROM DOC
      GROUP BY ID, NAME
    )
    SELECT D1.ID, D1.NAME, DOCTEXT = 
      LTRIM(CONVERT(NVARCHAR(MAX), CONVERT(VARBINARY(MAX), D.DOCTEXT, 2)))
    FROM D1 CROSS APPLY (
      SELECT NULLIF(
        CONVERT(VARCHAR(MAX), CONVERT(VARBINARY(MAX), ' ' + LTRIM(RTRIM(D2.DOCTEXT))), 2), 
        0x)
      FROM DOC D2 
      WHERE D2.ID = D1.ID AND D2.[NAME] = D1.[NAME]
      FOR XML PATH('')
    ) D(DOCTEXT)
    

    We can't use BINARY BASE64 here, because concatenating two Base64 strings does not (generally) yield another Base64 string. The outer LTRIM() takes care of removing the initial space; use STUFF instead for more precise results if you like, but since we're trimming spaces of the internal strings it doesn't matter here.

    Note that there are other approaches for concatenating strings (most notably STRING_AGG in SQL Server 2017) and the article you linked to mentions them. This approach is not necessarily the best in terms of performance, but I haven't measured.