I have the column in a table in the below format
ID Comments
img screenshot as request by comments
The comments column has multiple rows and a sample text from one comments column is below:
"55-9988, Version 1.0 dated 07/20/2009
3684 for 66-0022
IB from Microsoft , for Monovalent A/king Influenza Subvirion Vaccine, Version 1.0 dated 06/27/2009
Package Insert from Microsoft , for Fluzone, dated 06/2008
Package Insert from Microsoft , for H5N1, dated 04/2007
IB from google, for AS93 as an Adjuvant for use with a king Vaccine, Version 1.0 dated 07/2009
Package Insert from Microsoft , for Fluzone, Version 37 dated 06/18/2008
55-9988 MIA, Version 1.0 dated 07/20/2009"
My task is to append the string text "Product No" right before the string pattern xx-xxxx (ex in the column above 66-0022).
The thing is this pattern xx-xxxx occurs more than once in each columns.
I was able to extract the multiple occurrences with help of the below code.
DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, tokens VARCHAR(1024));
INSERT INTO @tbl (tokens) VALUES
('55-9988, Version 1.0 dated 07/20/2009
3684 for 66-0022
IB from Microsoft , for Monovalent A/king Influenza Subvirion Vaccine, Version 1.0 dated 06/27/2009
Package Insert from Microsoft , for Fluzone, dated 06/2008
Package Insert from Microsoft , for H5N1, dated 04/2007
IB from google, for AS93 as an Adjuvant for use with a king Vaccine, Version 1.0 dated 07/2009
Package Insert from Microsoft , for Fluzone, Version 37 dated 06/18/2008
55-9988 MIA, Version 1.0 dated 07/20/2009'),
('fafa');
-- DDL and sample data population, end
-- Method #1
-- SQL Server 2017 onwards
DECLARE @CrLf CHAR(2) = CHAR(13) + CHAR(10);
SELECT id, token
FROM @tbl
CROSS APPLY STRING_SPLIT(REPLACE(tokens, @CrLf,SPACE(1)), SPACE(1))
CROSS APPLY (SELECT TRIM(',' FROM value)) AS t(token)
WHERE token LIKE '[0-9][0-9]-[0-9][0-9][0-9][0-9]%';
-- Method #2
-- SQL Server 2008 onwards
DECLARE @separator CHAR(1) = SPACE(1);
SELECT id, REPLACE(token, ',', '') AS token
FROM @tbl
CROSS APPLY (SELECT CAST('<root><r><![CDATA[' +
REPLACE(REPLACE(tokens, @CrLf,@separator), @separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c)
CROSS APPLY c.nodes('/root/r/text()') AS t2(x)
CROSS APPLY (SELECT x.value('.', 'VARCHAR(50)')) AS t3(token)
WHERE token LIKE '[0-9][0-9]-[0-9][0-9][0-9][0-9]%';
But this is splitting the instance into multiple rows and I'm not sure how I can achieve my objective of appending the string text of "Product No" prior to every instance of occurrence. Please advise and suggest if there is any other better approach.
I suggest recursive add your prefix to pattern matches.
In each step we move current pattern match position
r.idx+@prefixLen+@patLen+ t.cidx idx
New index is <current pattern position>+<prev index>+<length of prefix>+<length of pattern>
Then take last row as row_number()=1 ordered by level of recursion desc
from(
select *,row_number()over(partition by id order by lvl desc)rn
from r
)a
where rn=1
This cross apply used only to simplify formulas
cross apply( values
(patindex('%[0-9][0-9]-[0-9][0-9][0-9][0-9]%'
,substring(newtokens,r.idx+@prefixLen+@patLen+1,tln)))
)t(cidx)
See example:
Test data:
id | tokens |
---|---|
1 | 55-9988, Version 1.0 dated 07/20/2009 3684 for 66-0022 IB from Microsoft , for Monovalent A/king Influenza Subvirion Vaccine, Version 1.0 dated 06/27/2009 Package Insert from Microsoft , for Fluzone, dated 06/2008 Package Insert from Microsoft , for H5N1, dated 04/2007 IB from google, for AS93 as an Adjuvant for use with a king Vaccine, Version 1.0 dated 07/2009 Package Insert from Microsoft , for Fluzone, Version 37 dated 06/18/2008 55-9988 MIA, Version 1.0 dated 07/20/2009 |
2 | But this is splitting the instance 11-1111 into multiple rows and I'm not sure 22-2222 how I can achieve my objective of appending the string text of "Product No" prior to every instance of occurrence. Please advise and suggest if there is any other better approach.99-1234 |
3 | fafa |
DECLARE @prefix varchar(20)='Product with No';
DECLARE @prefixLen int=len(@prefix);
DECLARE @patLen int=7;
with r as(
select 1 lvl,id ,len(tokens) tln
,cidx idx
,cast(case when cidx>0 then
substring(tokens,1,cidx-1)
+@prefix+substring(tokens,cidx,len(tokens))
else tokens
end as varchar(1000)) newtokens
from tbl
cross apply( values(patindex('%[0-9][0-9]-[0-9][0-9][0-9][0-9]%',tokens)) )t(cidx)
union all
select lvl+1 lvl,id ,tln
,r.idx+@prefixLen+@patLen+ t.cidx idx
,cast(case when t.cidx>0 then
substring(newtokens,1,r.idx+@prefixLen+@patLen+t.cidx-1)
+@prefix+substring(newtokens,t.cidx+r.idx+@prefixLen+@patLen,tln)
else newtokens
end as varchar(1000)) newtokens
from r
cross apply( values
(patindex('%[0-9][0-9]-[0-9][0-9][0-9][0-9]%'
,substring(newtokens,r.idx+@prefixLen+@patLen+1,tln)))
)t(cidx)
where t.cidx>0
and lvl<6
)
select *
from(
select *,row_number()over(partition by id order by lvl desc)rn
from r
)a
where rn=1
order by id,lvl;
Output with prefix 'Product No'
lvl | id | newtokens | rn |
---|---|---|---|
3 | 1 | Product No55-9988, Version 1.0 dated 07/20/2009 3684 for Product No66-0022 IB from Microsoft , for Monovalent A/king Influenza Subvirion Vaccine, Version 1.0 dated 06/27/2009 Package Insert from Microsoft , for Fluzone, dated 06/2008 Package Insert from Microsoft , for H5N1, dated 04/2007 IB from google, for AS93 as an Adjuvant for use with a king Vaccine, Version 1.0 dated 07/2009 Package Insert from Microsoft , for Fluzone, Version 37 dated 06/18/2008 Product No55-9988 MIA, Version 1.0 dated 07/20/2009 |
1 |
3 | 2 | But this is splitting the instance Product No11-1111 into multiple rows and I'm not sure Product No22-2222 how I can achieve my objective of appending the string text of "Product No" prior to every instance of occurrence. Please advise and suggest if there is any other better approach.Product No99-1234 |
1 |
1 | 3 | fafa | 1 |