Search code examples
sqloracle-databaseregexp-substr

Returning a specific substring from ORACLE SQL string using REGEXP_SUBSTR


I am having a difficult time trying to return a specific section of string from a field (BSE.NOTES) using REGEXT_SUBSTR. For my query, I have a VARCHAR2 field with a specific text that I would like to return:

Hospital Dept Name Case: SP12-34567 Authorizing Provider: D. K, MD MSCR Collected: 07/09/2021 12:49 PM Ordering Location: Hospital Received: 07/09/2021 03:23 PM Pathologist: D. L., MD Specimens: A) - Body part 1 B) - Body part 2

From this text, I need to return the string "Case: SP-***" for each record. I tried using the following code, but only get NULL values:

REGEXP_SUBSTR(BSE.NOTES, '(\S|^)(Case\:\s)([0-9]\-\[0-9])', 1, 1, NULL) AS CASE_NUMB

I am not very versed and using regexp_substr() so any help is greatly appreciated!


Solution

  • Your pattern is looking for a single numeric digit either side of the -, and doesn't allow for the 'SP' part; it will also only match after the start of the line or a non-whitespace character, and your parentheses don't look right. And you haven't specified that you want a sub-pattern (read more in the docs).

    This will get the value you said you wanted:

    select REGEXP_SUBSTR(BSE.NOTES, '(\s|^)(Case:\sSP[0-9]+-[0-9]+)', 1, 1, null, 2) AS CASE_NUMB
    from bse
    
    CASE_NUMB
    Case: SP12-34567

    Or if you actually only want the second part of that you can add more sub-patterns:

    select REGEXP_SUBSTR(BSE.NOTES, '(\s|^)(Case:\s(SP[0-9]+-[0-9]+))', 1, 1, null, 3) AS CASE_NUMB
    from bse
    
    CASE_NUMB
    SP12-34567

    fiddle

    Using regular expressions to extract data from fields like this is a bit error-prone though, particularly where it's essentially free-form text that could have odd capitalisation (you can make the third argument 'i' to help with that), spacing (maybe make the second \s optional, as \s?), typos or other oddities... And you may not need the anchor/whitespace at the start, but it might not hurt, particularly if you do go case-insensitive.