Ok I have a bunch of data all of which contains a code in the text but it is not all correctly formatted for example:
Well at Wallgreens Regular Strength Antacid Liquid (Alumina Magnesia Simethicone Antacid & Anti Gas) Mint a)12 oz bottle (NDC 0363-0073-02) b) 26 oz bottle (NDC 0363-0073-26) Distributed by Walgreens CO 200 Wilmot Rd Deerfield IL 60015
IDPN (Intradialytic Parenteral Nutrition - dialysate solution with added amino acids) a) 490mL bag b) 500mL bag and c) 590mL bag Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132
Aminosyn-PF (amino acids) 7% Sulfite-Free 500 mL Bags Rx Only Hospira Inc Lake Forest IL 60045 NDC: 0409-4178-03 Barcode (01) 0 030409 417803 5
I am only interested in the 8-9 digits formatted like so:
xxxx-xxxx or xxxxx-xxxx
I have currently selected these entries by using:
WHERE [Product Description] LIKE '%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%' OR [Product Description] LIKE '%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
but I would like output the string it has matched with the like not the whole Product Description only the code it finds for example:
0363-0073
19061-3132
0409-4178
This is a slightly different method, which doesn't use the UNION ALL
:
WITH VTE AS (
SELECT *
FROM (VALUES ('Well at Wallgreens Regular Strength Antacid Liquid (Alumina Magnesia Simethicone Antacid & Anti Gas) Mint a)12 oz bottle (NDC 0363-0073-02) b) 26 oz bottle (NDC 0363-0073-26) Distributed by Walgreens CO 200 Wilmot Rd Deerfield IL 60015'),
('IDPN (Intradialytic Parenteral Nutrition - dialysate solution with added amino acids) a) 490mL bag b) 500mL bag and c) 590mL bag Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132'),
('Aminosyn-PF (amino acids) 7% Sulfite-Free 500 mL Bags Rx Only Hospira Inc Lake Forest IL 60045 NDC: 0409-4178-03 Barcode (01) 0 030409 417803 5')) V(S))
SELECT V.S,
CASE WHEN PI1.C > 0 THEN SUBSTRING(V.S,PI1.C, 10)
WHEN PI2.C > 0 THEN SUBSTRING(V.S,PI2.C, 9)
ELSE NULL
END AS N
FROM VTE V
CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%',V.S))) PI1(C)
CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%',V.S))) PI2(C);
The reason for the 2 PATINDEX
's is because the values 12345-6789
would fulfil the pattern '%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
. Thus the check for the 10 character format is completed first, and then the 9 character. The CASE
expression also avoids an error if neither pattern is found, as if both PI1.C
and PI2.C
return 0
(meaning the pattern wasn't found) then NULL
is returned.