Search code examples
sqlreplacesql-server-2016

Keep only allowed characters in a string


If I know which characters I do not want to keep in my string I can easily remove them with REPLACE:

SELECT REPLACE(
         REPLACE(
           REPLACE(
             'String with characters like #§$ I do not want to keep', 
             '#', ''
           ), '§', ''
         ), '$', ''
       ) AS repl_string

--- String with characters like  I do not want to keep

But, what if I want to remove every character which is not part of a positive list? With a regex I would so something like s/[^a-zA-Z0-9 :.]//g (assuming that I would like to allow only letters, numbers, a space , a dot . or a colon :)

I am looking for a solution for Microsoft SQL Server 2016


Solution

  • This answer uses a very similar solution to the answer I linked above, however, as the OP is using SQL Server 2016, they can't use STRING_AGG. As such you'll need to use the "old" FOR XML PATH (and STUFF) method to reaggregate the string:

    CREATE OR ALTER FUNCTION [dbo].[PatternCharacterReplace_XML] (@String varchar(8000), @Pattern varchar(100), @ReplacementCharacter varchar(1)) 
    RETURNS table
    AS RETURN
        WITH N AS(
            SELECT N
            FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
        Tally AS(
            SELECT TOP(LEN(@String))
                   ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
            FROM N N1, N N2, N N3, N N4)
        SELECT (SELECT CASE WHEN V.C LIKE @Pattern THEN @ReplacementCharacter ELSE V.C END
                FROM Tally T
                     CROSS APPLY (VALUES(SUBSTRING(@String,T.I,1)))V(C)
                ORDER BY T.I
                FOR XML PATH(''),TYPE).value('(./text())[1]','varchar(8000)') AS ReplacedString;
    GO
    
    
    SELECT *
    FROM dbo.PatternCharacterReplace_XML('String with characters like #§$ I do not want to keep','[^A-Za-z0-9 .:]','');
    

    Note that this particular version uses varchar values. If your value is an nvarchar I would suggest creating a separate version that uses that data type for the parameters.