Search code examples
sql-serverstringpattern-matchingsql-like

Grabbing number from text where meets specific critera


Ok I have a bunch of data all of which contains a code in the text but it is not all correctly formatted for example:

Well at Wallgreens Regular Strength Antacid Liquid (Alumina Magnesia Simethicone Antacid & Anti Gas) Mint a)12 oz bottle (NDC 0363-0073-02) b) 26 oz bottle (NDC 0363-0073-26) Distributed by Walgreens CO 200 Wilmot Rd Deerfield IL 60015

IDPN (Intradialytic Parenteral Nutrition - dialysate solution with added amino acids) a) 490mL bag b) 500mL bag and c) 590mL bag Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132

Aminosyn-PF (amino acids) 7% Sulfite-Free 500 mL Bags Rx Only Hospira Inc Lake Forest IL 60045 NDC: 0409-4178-03 Barcode (01) 0 030409 417803 5

I am only interested in the 8-9 digits formatted like so:

xxxx-xxxx or xxxxx-xxxx

I have currently selected these entries by using:

WHERE [Product Description] LIKE '%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%' OR [Product Description] LIKE  '%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'

but I would like output the string it has matched with the like not the whole Product Description only the code it finds for example:

0363-0073

19061-3132

0409-4178


Solution

  • This is a slightly different method, which doesn't use the UNION ALL:

    WITH VTE AS (
        SELECT *
        FROM (VALUES ('Well at Wallgreens Regular Strength Antacid Liquid (Alumina Magnesia Simethicone Antacid & Anti Gas) Mint a)12 oz bottle (NDC 0363-0073-02) b) 26 oz bottle (NDC 0363-0073-26) Distributed by Walgreens CO 200 Wilmot Rd Deerfield IL 60015'),
                     ('IDPN (Intradialytic Parenteral Nutrition - dialysate solution with added amino acids) a) 490mL bag b) 500mL bag and c) 590mL bag Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132'),
                     ('Aminosyn-PF (amino acids) 7% Sulfite-Free 500 mL Bags Rx Only Hospira Inc Lake Forest IL 60045 NDC: 0409-4178-03 Barcode (01) 0 030409 417803 5')) V(S))
    SELECT V.S,
           CASE WHEN PI1.C > 0 THEN SUBSTRING(V.S,PI1.C, 10)
                WHEN PI2.C > 0 THEN SUBSTRING(V.S,PI2.C, 9)
                ELSE NULL
           END AS N
    FROM VTE V
         CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%',V.S))) PI1(C)
         CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%',V.S))) PI2(C);
    

    The reason for the 2 PATINDEX's is because the values 12345-6789 would fulfil the pattern '%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'. Thus the check for the 10 character format is completed first, and then the 9 character. The CASE expression also avoids an error if neither pattern is found, as if both PI1.C and PI2.C return 0 (meaning the pattern wasn't found) then NULL is returned.