Search code examples
sql-serversql-server-2014

SQL Server - Remove all non-printable ASCII characters


We recently migrated from SQL Server 2012 to SQL Server 2014 and all our FOR XML code started throwing errors about non-printable ASCII characters. I wrote this horrible function to remove non-printable ASCII characters as a quick fix. I want to replace it with something cleaner. Is there a way to do this?

ALTER FUNCTION [dbo].[remove_non_printable_chars]
(@input_string nvarchar(max))
RETURNS nvarchar(max)
BEGIN
    RETURN
    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    REPLACE(REPLACE(@input_string,
        CHAR(1), ''),CHAR(2), ''),CHAR(3), ''),CHAR(4), ''),CHAR(5), ''),CHAR(6), ''),CHAR(7), ''),CHAR(8), ''),CHAR(9), ''),CHAR(10), ''),
        CHAR(11), ''),CHAR(12), ''),CHAR(13), ''),CHAR(14), ''),CHAR(15), ''),CHAR(16), ''),CHAR(17), ''),CHAR(18), ''),CHAR(19), ''),CHAR(20), ''),
        CHAR(21), ''),CHAR(22), ''),CHAR(23), ''),CHAR(24), ''),CHAR(25), ''),CHAR(26), ''),CHAR(27), ''),CHAR(28), ''),CHAR(29), ''),CHAR(30), ''),
        CHAR(31), ''), NCHAR(0) COLLATE Latin1_General_100_BIN2, '')
END

Here's the FOR XML code that broke. (I did not write this. It was already in the code base).

SELECT @HTMLTableData =
(
    SELECT  HTMLRows 
    FROM (
        SELECT N'<tr>' + HTMLRow + N'</tr>' AS HTMLRows 
        FROM @HTMLRowData
    ) mi            
    FOR XML PATH(''), TYPE
).value('/', 'NVARCHAR(MAX)')

Solution

  • Another Option.

    This function will replace control characters and correct any residual repeating spaces. For example Jane Smith{13}was here will not be returned as Jane Smithwas here, but rather Jane Smith was here

    CREATE FUNCTION [dbo].[udf-Str-Strip-Control](@S varchar(max))
    Returns varchar(max)
    Begin
        ;with  cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
               cte2(C) As (Select Top (32) Char(Row_Number() over (Order By (Select NULL))-1) From cte1 a,cte1 b)
        Select @S = Replace(@S,C,' ')
         From  cte2
    
        Return ltrim(rtrim(replace(replace(replace(@S,' ','†‡'),'‡†',''),'†‡',' ')))
    End
    --Select [dbo].[udf-Str-Strip-Control]('Michael        '+char(13)+char(10)+'LastName')  --Returns: Michael LastName