Search code examples
sqlsql-serversql-server-2008

Search multiple occurrences of a word in a string


I have the below text in a column in a SQL Server table.

"55-9988, Version 1.0 dated 07/20/2009
3684 for 66-0022
IB from Microsoft , for Monovalent A/king Influenza Subvirion Vaccine, Version 1.0 dated 06/27/2009
Package Insert from Microsoft , for Fluzone, dated 06/2008
Package Insert from Microsoft , for H5N1, dated 04/2007
IB from google, for AS93 as an Adjuvant for use with a  king Vaccine, Version 1.0 dated 07/2009
Package Insert from Microsoft , for Fluzone, Version 37 dated 06/18/2008
55-9988 MIA, Version 1.0 dated 07/20/2009"

This is an example from one cell and I have multiple rows with similar text.

In this case, I need to obtain the pattern xx-xxxx (55-9988, 66-0022) from the above string and I need all occurrences.

I'm able to obtain the first instance of the occurrence, but I'm unable to get the remaining occurrences from the text.


Solution

  • Please try the following solution based on the STRING_SPLIT() function use.

    Also, it is using an enhanced TRIM() function. So, it will work starting from SQL Server 2017 onwards.

    LIKE '[0-9][0-9]-[0-9][0-9][0-9][0-9]%' clause is looking for tokens that are matching a particular pattern.

    I added Method #2 XML based implementation to support SQL Server 2008.

    SQL

    -- DDL and sample data population, start
    DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, tokens VARCHAR(1024));
    INSERT INTO @tbl (tokens) VALUES
    ('55-9988, Version 1.0 dated 07/20/2009
    3684 for 66-0022
    IB from Microsoft , for Monovalent A/king Influenza Subvirion Vaccine, Version 1.0 dated 06/27/2009
    Package Insert from Microsoft , for Fluzone, dated 06/2008
    Package Insert from Microsoft , for H5N1, dated 04/2007
    IB from google, for AS93 as an Adjuvant for use with a  king Vaccine, Version 1.0 dated 07/2009
    Package Insert from Microsoft , for Fluzone, Version 37 dated 06/18/2008
    55-9988 MIA, Version 1.0 dated 07/20/2009'),
    ('fafa');
    -- DDL and sample data population, end
    
    -- Method #1
    -- SQL Server 2017 onwards
    DECLARE @CrLf CHAR(2) = CHAR(13) + CHAR(10);
    
    SELECT id, token
    FROM @tbl
    CROSS APPLY STRING_SPLIT(REPLACE(tokens, @CrLf,SPACE(1)), SPACE(1))
    CROSS APPLY (SELECT TRIM(',' FROM value)) AS t(token)
    WHERE token LIKE '[0-9][0-9]-[0-9][0-9][0-9][0-9]%';
    
    -- Method #2
    -- SQL Server 2008 onwards
    DECLARE @separator CHAR(1) = SPACE(1);
    
    SELECT id, REPLACE(token, ',', '') AS token
    FROM @tbl
    CROSS APPLY (SELECT CAST('<root><r><![CDATA[' + 
          REPLACE(REPLACE(tokens, @CrLf,@separator), @separator, ']]></r><r><![CDATA[') + 
          ']]></r></root>' AS XML)) AS t1(c)
    CROSS APPLY c.nodes('/root/r/text()') AS t2(x)
    CROSS APPLY (SELECT x.value('.', 'VARCHAR(50)')) AS t3(token)
    WHERE token LIKE '[0-9][0-9]-[0-9][0-9][0-9][0-9]%';
    

    Output

    id token
    1 55-9988
    1 66-0022
    1 55-9988