I am working on regex_substr
to get the following result in oracle
Test 1.Input: JOHN 10BC STUDENT Desired Output: JOHN
Test 2.Input: JOHN STUDENT Desired Output: JOHN
Test 3.Input: JOHN 10BC STU Desired Output: JOHN
Test 4.Input: JOHN 10BC TEACHER Desired Output:NULL
Test 5.Input: JOHN TEACHER Desired Output:NULL
Test 6. Input: MR JOHN 08BC STU Desired Output: MR JOHN
Test 7. Input: MR JOHN STUDENT Desired Output: MR JOHN
Test 8. Input: MR JOHN 07BC TEACHER Desired Output: Null
Test 9. Input: MR STUART 06BC STDUENT Desired Output: MR STUART
Test 10. Input: MR STUART LEE 05BC STDUENT Desired Output: MR STUART LEE
Query:
Test 1:
select REGEXP_SUBSTR('JOHN 10BC STUDENT','(.*)(\s+.*BC)?\sSTU(DENT)?',1,1,'i',1) from dual;
Output: Failed . Returned JOHN 10BC instead of JOHN
Test 2:
select REGEXP_SUBSTR('JOHN STUDENT','(.*)(\s+.*BC)?\sSTU(DENT)?',1,1,'i',1) from dual;
Output: Passed. Returned JOHN
I modified the query as below by removing the ? in the second block
Test 1:select REGEXP_SUBSTR('JOHN 10BC STUDENT','(.*)(\s+.*BC)\sSTU(DENT)?',1,1,'i',1) from dual;
Output: Passed. Returned JOHN
Test 2:select REGEXP_SUBSTR('JOHN STUDENT','(.*)(\s+.*BC)\sSTU(DENT)?',1,1,'i',1) from dual;
Output: Failed. Returned Null instead of JOHN.
How to ignore the middle optional word BC and always return the words till BC as a substring for STUDENT using REGEXP_SUBSTR
. Thanks in Advance!!!
Use non-greedy pattern for any-character match (.*
) by adding a question mark after it (.*?
) to prevent it from grabbing an optional part ((\s+.*BC)?
):
with test (id, col) as ( select 1, 'JOHN 10BC STUDENT' from dual union all select 2, 'JOHN STUDENT' from dual union all select 3, 'JOHN 10BC STU' from dual union all select 4, 'JOHN 10BC TEACHER' from dual union all select 4, 'JSTUOHN 10BC TEACHER' from dual union all select 5, 'JOHN TEACHER' from dual ) select id, col, regexp_substr(col, '(.*?)(\s+.*BC)?\sSTU(DENT)?',1,1,'i',1) as qwe from test
ID | COL | QWE -: | :------------------- | :---- 1 | JOHN 10BC STUDENT | JOHN 2 | JOHN STUDENT | JOHN 3 | JOHN 10BC STU | JOHN 4 | JOHN 10BC TEACHER | null 4 | JSTUOHN 10BC TEACHER | null 5 | JOHN TEACHER | null
db<>fiddle here
UPD. To allow multiple words as the first part, you may use the following regex:
(.+?)(\s+\S*BC)?\s+STU(DENT)?$
Explanation:
(.+?)
- match any character one or more times with the shortest match (allowing following optional groups to be matched).(\s+\S*BC)?
- optional match for the group of zero or more non-whitespace (\S
) characters followed by BC
. The group should be separated from the previous non-whitespace character by one or more whitespace characters. \S*
instead of .*
will prevent from inclusion of multiple words into this group.\s+STU(DENT)?$
- the last part should be a word STUDENT
or it's abbreviation STU
. It should be the last word in the line ($
right after it).And the result of extended test case is:
with test (id, col) as ( select 1, 'JOHN 10BC STUDENT' from dual union all select 2, 'JOHN STUDENT' from dual union all select 3, 'JOHN 10BC STU' from dual union all select 4, 'JOHN 10BC TEACHER' from dual union all select 4, 'JSTUOHN 10BC TEACHER' from dual union all select 5, 'JOHN TEACHER' from dual union all select 6, 'MR JOHN 10BC STU' from dual union all select 7, 'MR STUART ABC 10BC STUDENT' from dual union all select 8, 'MR STUART ABC 10BC STUDENTS' from dual union all select 9, 'MR STUART STU 10BC TEACHER' from dual ) select id, col, regexp_substr(col, '(.*?)(\s+\S*BC)?\s+STU(DENT)?$',1,1,'i',1) as qwe from test
ID | COL | QWE -: | :-------------------------- | :------------ 1 | JOHN 10BC STUDENT | JOHN 2 | JOHN STUDENT | JOHN 3 | JOHN 10BC STU | JOHN 4 | JOHN 10BC TEACHER | null 4 | JSTUOHN 10BC TEACHER | null 5 | JOHN TEACHER | null 6 | MR JOHN 10BC STU | MR JOHN 7 | MR STUART ABC 10BC STUDENT | MR STUART ABC 8 | MR STUART ABC 10BC STUDENTS | null 9 | MR STUART STU 10BC TEACHER | null
db<>fiddle here