Search code examples
sql-serverregex

Regex pattern inside SQL Replace function?


SELECT REPLACE('<strong>100</strong><b>.00 GB', '%^(^-?\d*\.{0,1}\d+$)%', '');

I want to replace any markup between two parts of the number with above regex, but it does not seem to work. I'm not sure if it is regex syntax that's wrong because I tried simpler one such as '%[^0-9]%' just to test but it didn't work either. Does anyone know how can I achieve this?


Solution

  • You can use PATINDEX to find the first index of the pattern (string's) occurrence. Then use STUFF to stuff another string into the pattern(string) matched.

    Loop through each row. Replace each illegal characters with what you want. In your case replace non numeric with blank. The inner loop is if you have more than one illegal character in a current cell that of the loop.

    DECLARE @counter int
    
    SET @counter = 0
    
    WHILE(@counter < (SELECT MAX(ID_COLUMN) FROM Table))
    BEGIN  
       
        WHILE 1 = 1
        BEGIN
            DECLARE @RetVal varchar(50)
            
            SET @RetVal =  (SELECT Column = STUFF(Column, PATINDEX('%[^0-9.]%', Column),1, '')
            FROM Table
            WHERE ID_COLUMN = @counter)
            
            IF(@RetVal IS NOT NULL)       
              UPDATE Table SET
              Column = @RetVal
              WHERE ID_COLUMN = @counter
            ELSE
                break
        END
       
        SET @counter = @counter + 1
    END
    

    Caution: This is slow though! Having a varchar column may impact. So using LTRIM RTRIM may help a bit. Regardless, it is slow.

    Credit goes to this Stack Overflow answer.

    Credit also goes to @srutzky

    Instead of doing one row at a time, this answer can be adapted to a more set-based solution. It still iterates the max of the number of non-numeric characters in a single row, so it's not ideal, but I think it should be acceptable in most situations.

    WHILE 1 = 1 BEGIN
        WITH q AS
            (SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n
            FROM Table)
        UPDATE Table
        SET Column = STUFF(Column, q.n, 1, '')
        FROM q
        WHERE Table.ID_Column = q.ID_Column AND q.n != 0;
        
        IF @@ROWCOUNT = 0 BREAK;
    END;
    

    You can also improve efficiency quite a lot if you maintain a bit column in the table that indicates whether the field has been scrubbed yet. (NULL represents "Unknown" in my example and should be the column default.)

    DECLARE @done bit = 0;
    WHILE @done = 0 BEGIN
        WITH q AS
            (SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n
            FROM Table
            WHERE COALESCE(Scrubbed_Column, 0) = 0)
        UPDATE Table
        SET Column = STUFF(Column, q.n, 1, ''),
            Scrubbed_Column = 0
        FROM q
        WHERE Table.ID_Column = q.ID_Column AND q.n != 0;
    
        IF @@ROWCOUNT = 0 SET @done = 1;
    
        -- if Scrubbed_Column is still NULL, then the PATINDEX
        -- must have given 0
        UPDATE table
        SET Scrubbed_Column = CASE
            WHEN Scrubbed_Column IS NULL THEN 1
            ELSE NULLIF(Scrubbed_Column, 0)
        END;
    END;
    

    If you don't want to change your schema, this is easy to adapt to store intermediate results in a table valued variable which gets applied to the actual table at the end.