regex oracle-database plsql regexp-substr

How to extract text before several specified alphanumeric whole words from string in plsql

How to remove all characters after specific alphanumeric value from string for example "covid19 1st case" should be "covid19" if we remove string after 1st; in case of "covid19 2d case" it should be "covid19" if we remove string after 2d I am trying below query

select regexp_substr('covid19 1st case','[^1st]*') from dual;

but its giving covid as output any lead. if we have predefine alphanumeric values can we do it in single expression like we can remove all string after 1st and 2d.

Thanks

Solution

You can use

select regexp_substr('covid19 1st case','^(.*?)\s+(1st|2d)($|\W)', 1, 1, NULL, 1) from dual;
select regexp_substr('covid19 1st case','^(.*?)\s*(^|\W)(1st|2d)($|\W)', 1, 1, NULL, 1) from dual;

See the regex demo #1 and regex demo #2.

The (^|\W) and ($|\W) are used instad of word boundaries that are not supported by Oracle SQL regex engine.

Details:

^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
\s* - zero or more whitespaces (\s+ matches one or more)
(^|\W) - Group 2: start of string or a non-word char
(1st|2d) - Group 3: either 1st or 2d
($|\W) - Group 4: end of string or a non-word char.

Another variation is using REGEXP_REPLACE (you just need to match the rest of the string):

select regexp_replace('covid19 1st case','^(.*?)\s*(\W|^)(1st|2d)(\W|$).*', '\1') from dual;

See this regex demo, \1 refers to the Group 1 value.