Search code examples
sqlregexoracle-databaseregexp-replace

Using regex in Oracle sql


I want to extract 2 parts of certain string. For this I want to use regular expressions. For Example the String is: Big_Dog_0044_0080 and my Regex is: Big_Dog_([0-9]+)_?([0-9A-Z]*) Group 1 would be then 0044 and the second would be 0080.

Now my problem is to bring this into an select statement: I tried it with regex_substr.

This was my result:

 select 
'Big_Dog_0044_0080' as TestString,
regexp_substr('Big_Dog_0044_0080', '([0-9]+)') Group1 , 
regexp_substr('Big_Dog_0044_0080', '([0-9A-Z]*)') Group2 from dual;

My Output:

TESTSTRING         Group1 Group2
Big_Dog_0044_0080   0044    B

First of all Group 2 is wrong i would expect 0080. What I am doing wrong here?

And second question, any other way to solve this? I have like 5-6 different regexp I want to use. For example next one would be ([A-C])_CatWeezle

I mean like select the input string column and go through the possible regexp. If one regexp matches, then stop and extract values.

Thank you!


Solution

  • What I am doing wrong here?

    The regular expression '([0-9A-Z]*)' will match the first occurrence of zero-or-more digit or upper-case alphabetic characters in the string. For your string 'Big_Dog_0044_0080' the first character is the upper-case alphabetic character B so that will get matched and the second character i does not match your regular expression so it is not included in the group.

    You state that your string matches the pattern:

    Big_Dog_([0-9]+)_?([0-9A-Z]*)
    

    Then you can anchor the pattern to the end of the string using $ and can use:

    WITH test_data (value) AS (
      SELECT 'Big_Dog_0044_0080' FROM DUAL UNION ALL
      SELECT 'Big_Dog_00440080' FROM DUAL UNION ALL
      SELECT 'Big_Dog_00440A80' FROM DUAL
    )
    SELECT value,
           REGEXP_SUBSTR(value, '([0-9]+)_?([0-9A-Z]*)$', 1, 1, NULL, 1) AS Group1 , 
           REGEXP_SUBSTR(value, '([0-9]+)_?([0-9A-Z]*)$', 1, 1, NULL, 2) AS Group2
    FROM   test_data;
    

    Which outputs:

    VALUE GROUP1 GROUP2
    Big_Dog_0044_0080 0044 0080
    Big_Dog_00440080 00440080
    Big_Dog_00440A80 00440 A80

    (Note: the second row is matched entirely by the first group and the second group has zero-width and the third row will match the first group until it finds a non-digit character and then start the second group.)

    If the delimiting underscore is optional then you may want to use fixed-width matches (assuming the substrings are each 4-characters):

    WITH test_data (value) AS (
      SELECT 'Big_Dog_0044_0080' FROM DUAL UNION ALL
      SELECT 'Big_Dog_00440080' FROM DUAL UNION ALL
      SELECT 'Big_Dog_00440A80' FROM DUAL
    )
    SELECT value,
           REGEXP_SUBSTR(value, '([0-9]{4})_?([0-9A-Z]{4})$', 1, 1, NULL, 1) AS Group1 , 
           REGEXP_SUBSTR(value, '([0-9]{4})_?([0-9A-Z]{4})$', 1, 1, NULL, 2) AS Group2
    FROM   test_data;
    

    Which outputs:

    VALUE GROUP1 GROUP2
    Big_Dog_0044_0080 0044 0080
    Big_Dog_00440080 0044 0080
    Big_Dog_00440A80 0044 0A80

    I mean like select the input string column and go through the possible regexp. If one regexp matches, then stop and extract values.

    Use a CASE expression:

    WITH test_data (value) AS (
      SELECT 'Big_Dog_0044_0080' FROM DUAL UNION ALL
      SELECT 'Big_Dog_00440080' FROM DUAL UNION ALL
      SELECT 'Big_Dog_00440A80' FROM DUAL UNION ALL
      SELECT 'A_CatWeezle' FROM DUAL
    )
    SELECT value,
           CASE
           WHEN REGEXP_LIKE(value, '^Big_Dog_(\d{4})_?([0-9A-Z]{4})$')
           THEN REGEXP_SUBSTR(value, '^Big_Dog_(\d{4})_?([0-9A-Z]{4})$', 1, 1, NULL, 1)
           WHEN REGEXP_LIKE(value, '^([A-C])_CatWeezle$')
           THEN REGEXP_SUBSTR(value, '^([A-C])_CatWeezle$', 1, 1, NULL, 1)
           END AS group1,
           CASE
           WHEN REGEXP_LIKE(value, '^Big_Dog_(\d{4})_?([0-9A-Z]{4})$')
           THEN REGEXP_SUBSTR(value, '^Big_Dog_(\d{4})_?([0-9A-Z]{4})$', 1, 1, NULL, 2)
           END AS group2
    FROM   test_data;
    

    Outputs:

    VALUE GROUP1 GROUP2
    Big_Dog_0044_0080 0044 0080
    Big_Dog_00440080 0044 0080
    Big_Dog_00440A80 0044 0A80
    A_CatWeezle A

    db<>fiddle here