Search code examples
sqlsql-server

Append string text based on pattern search SQL


I have the column in a table in the below format

ID Comments

img screenshot as request by comments

The comments column has multiple rows and a sample text from one comments column is below:

"55-9988, Version 1.0 dated 07/20/2009
3684 for 66-0022
IB from Microsoft , for Monovalent A/king Influenza Subvirion Vaccine, Version 1.0 dated 06/27/2009
Package Insert from Microsoft , for Fluzone, dated 06/2008
Package Insert from Microsoft , for H5N1, dated 04/2007
IB from google, for AS93 as an Adjuvant for use with a  king Vaccine, Version 1.0 dated 07/2009
Package Insert from Microsoft , for Fluzone, Version 37 dated 06/18/2008
55-9988 MIA, Version 1.0 dated 07/20/2009"

My task is to append the string text "Product No" right before the string pattern xx-xxxx (ex in the column above 66-0022).

The thing is this pattern xx-xxxx occurs more than once in each columns.

I was able to extract the multiple occurrences with help of the below code.

DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, tokens VARCHAR(1024));
INSERT INTO @tbl (tokens) VALUES
('55-9988, Version 1.0 dated 07/20/2009
3684 for 66-0022
IB from Microsoft , for Monovalent A/king Influenza Subvirion Vaccine, Version 1.0 dated 06/27/2009
Package Insert from Microsoft , for Fluzone, dated 06/2008
Package Insert from Microsoft , for H5N1, dated 04/2007
IB from google, for AS93 as an Adjuvant for use with a  king Vaccine, Version 1.0 dated 07/2009
Package Insert from Microsoft , for Fluzone, Version 37 dated 06/18/2008
55-9988 MIA, Version 1.0 dated 07/20/2009'),
('fafa');
-- DDL and sample data population, end

-- Method #1
-- SQL Server 2017 onwards
DECLARE @CrLf CHAR(2) = CHAR(13) + CHAR(10);

SELECT id, token
FROM @tbl
CROSS APPLY STRING_SPLIT(REPLACE(tokens, @CrLf,SPACE(1)), SPACE(1))
CROSS APPLY (SELECT TRIM(',' FROM value)) AS t(token)
WHERE token LIKE '[0-9][0-9]-[0-9][0-9][0-9][0-9]%';

-- Method #2
-- SQL Server 2008 onwards
DECLARE @separator CHAR(1) = SPACE(1);

SELECT id, REPLACE(token, ',', '') AS token
FROM @tbl
CROSS APPLY (SELECT CAST('<root><r><![CDATA[' + 
      REPLACE(REPLACE(tokens, @CrLf,@separator), @separator, ']]></r><r><![CDATA[') + 
      ']]></r></root>' AS XML)) AS t1(c)
CROSS APPLY c.nodes('/root/r/text()') AS t2(x)
CROSS APPLY (SELECT x.value('.', 'VARCHAR(50)')) AS t3(token)
WHERE token LIKE '[0-9][0-9]-[0-9][0-9][0-9][0-9]%';

But this is splitting the instance into multiple rows and I'm not sure how I can achieve my objective of appending the string text of "Product No" prior to every instance of occurrence. Please advise and suggest if there is any other better approach.


Solution

  • I suggest recursive add your prefix to pattern matches.

    • search pattern and add (insert) prefix in this position
    • recursively search pattern in substring of tokens from precious match to end of tokens and add prefix in this position

    In each step we move current pattern match position

        r.idx+@prefixLen+@patLen+ t.cidx idx
    

    New index is <current pattern position>+<prev index>+<length of prefix>+<length of pattern>

    Then take last row as row_number()=1 ordered by level of recursion desc

    from(
      select *,row_number()over(partition by id order by lvl desc)rn
      from r 
    )a
    where rn=1
    

    This cross apply used only to simplify formulas

       cross apply( values
        (patindex('%[0-9][0-9]-[0-9][0-9][0-9][0-9]%'
                   ,substring(newtokens,r.idx+@prefixLen+@patLen+1,tln))) 
                 )t(cidx)
    

    See example:
    Test data:

    id tokens
    1 55-9988, Version 1.0 dated 07/20/2009
    3684 for 66-0022
    IB from Microsoft , for Monovalent A/king Influenza Subvirion Vaccine, Version 1.0 dated 06/27/2009
    Package Insert from Microsoft , for Fluzone, dated 06/2008
    Package Insert from Microsoft , for H5N1, dated 04/2007
    IB from google, for AS93 as an Adjuvant for use with a king Vaccine, Version 1.0 dated 07/2009
    Package Insert from Microsoft , for Fluzone, Version 37 dated 06/18/2008
    55-9988 MIA, Version 1.0 dated 07/20/2009
    2 But this is splitting the instance 11-1111
      into multiple rows and I'm not sure
      22-2222 how I can achieve my objective
      of appending the string text of "Product No"
      prior to every instance of occurrence.
      Please advise and suggest
      if there is any other better approach.99-1234
    3 fafa
    DECLARE @prefix varchar(20)='Product with No';
    DECLARE @prefixLen int=len(@prefix);
    DECLARE @patLen int=7;
    
    with r as(
       select 1 lvl,id ,len(tokens) tln
         ,cidx idx
         ,cast(case when cidx>0 then
             substring(tokens,1,cidx-1)
            +@prefix+substring(tokens,cidx,len(tokens))
          else tokens
          end as varchar(1000)) newtokens
       from tbl
       cross apply( values(patindex('%[0-9][0-9]-[0-9][0-9][0-9][0-9]%',tokens)) )t(cidx)
       union all
       select lvl+1 lvl,id ,tln
            ,r.idx+@prefixLen+@patLen+ t.cidx idx
         ,cast(case when t.cidx>0 then
             substring(newtokens,1,r.idx+@prefixLen+@patLen+t.cidx-1)
            +@prefix+substring(newtokens,t.cidx+r.idx+@prefixLen+@patLen,tln)
          else newtokens
          end as varchar(1000)) newtokens
       from r
       cross apply( values
        (patindex('%[0-9][0-9]-[0-9][0-9][0-9][0-9]%'
                   ,substring(newtokens,r.idx+@prefixLen+@patLen+1,tln))) 
                 )t(cidx)
       where  t.cidx>0  
       and lvl<6
    )
    select *
    from(
      select *,row_number()over(partition by id order by lvl desc)rn
      from r 
    )a
    where rn=1
    order by id,lvl;
    

    Output with prefix 'Product No'

    lvl id newtokens rn
    3 1 Product No55-9988, Version 1.0 dated 07/20/2009
    3684 for Product No66-0022
    IB from Microsoft , for Monovalent A/king Influenza Subvirion Vaccine, Version 1.0 dated 06/27/2009
    Package Insert from Microsoft , for Fluzone, dated 06/2008
    Package Insert from Microsoft , for H5N1, dated 04/2007
    IB from google, for AS93 as an Adjuvant for use with a king Vaccine, Version 1.0 dated 07/2009
    Package Insert from Microsoft , for Fluzone, Version 37 dated 06/18/2008
    Product No55-9988 MIA, Version 1.0 dated 07/20/2009
    1
    3 2 But this is splitting the instance Product No11-1111
      into multiple rows and I'm not sure
      Product No22-2222 how I can achieve my objective
      of appending the string text of "Product No"
      prior to every instance of occurrence.
      Please advise and suggest
      if there is any other better approach.Product No99-1234
    1
    1 3 fafa 1

    fiddle