I am not an expert in regex expression and looking for help. Thanks in advance
I want to extract a formatted substring from a description column. Sample below
FROM
my testing on 456897 - Carol M. Smith, Ph.D.
my testing on 435670 - Ms. Paulina M. Hall
my testing on 980765 - Mr. John Smith
my testing on 14567 - Mrs. Lena C. Callum
my testing on 555777 - Dr. Paul F. Fairlake
234567 - Mr. Ryan M. Palmer, Sr.
123456 - Joyce R. Hilton, Ph.D.
TO
my testing on 456897 - C.Smith
my testing on 435670 - Ms. P. Hall
my testing on 980765 - Mr. J. Smith
my testing on 14567 - Mrs. L. Callum
my testing on 555777 - Dr. P. Fairlake
234567 - Mr. R. Palmer
123456 - J. Hilton
My query works for the first and last record. However, the ones with titles are a little bit complicated.
For the records with title, I need to keep the title the first name and the Last Name initial.
SELECT description,
CASE
WHEN REGEXP_LIKE(description, '(Mr\.|Ms\.|Mrs\.|Dr\.)') THEN REGEXP_REPLACE(description, '(Ms\.|Mr\.|Mrs\.|Dr\.[A-Z][a-z]+ [A-Z]\.)')
WHEN NOT REGEXP_LIKE(description, '(Mr\.|Ms\.|Mrs\.|Dr\.)') THEN REGEXP_REPLACE(description, '(\w)\w*\W+(\w)\w*\W+(\w+),.*', '\1. \3')
ELSE 'some other validation needed'
END AS order_regex
from mytable;
Again thanks for any suggestion. K
I would do it like this:
select
t1.*
,regexp_replace(
t1.description
,'([^-]+)-\s*((Mr|Ms|Mrs|Dr)[.]\s*)?(\w)\w*(\s[a-zA-Z.]*)*\s(\w+)(,.*|$)'
,'\1- \2\4. \6'
) subs
from t1
Short description of this regular expression:
([^-]+)-
- finds the first part of a substring ending with -
(subexpression #1)\s*
- any number of space characters((Mr|Ms|Mrs|Dr)[.]\s*)?
- checks if Mr.|Ms.|Mrs.|Dr. exists and returns as subexpression #2(\w)\w*
- finds a name and returns the first letter as subexpression $3(\s[a-zA-Z.]*)*
- any number of words between the first name and the last name (subexpression #4)\s(\w+)(,.*|$)
- finds last name (i.e., the last word before ',' or the end of string) and returns as subexpression #5.Full test case:
with t1 as (
select 'my testing on 456897 - Carol M. Smith, Ph.D. ' description from dual union all
select 'my testing on 435670 - Ms. Paulina M. Hall' from dual union all
select 'my testing on 980765 - Mr. John Smith' from dual union all
select 'my testing on 14567 - Mrs. Lena C. Callum' from dual union all
select 'my testing on 555777 - Dr. Paul F. Fairlake' from dual union all
select '234567 - Mr. Ryan M. Palmer, Sr.' from dual union all
select '123456 - Joyce R. Hilton, Ph.D.' from dual
)
select
t1.*
,regexp_replace(
t1.description
,'([^-]+)-\s*((Mr|Ms|Mrs|Dr)[.]\s*)?(\w)\w*(\s[a-zA-Z.]*)*\s(\w+)(,.*|$)'
,'\1- \2\4. \6'
) subs
from t1;
DBFiddle: https://dbfiddle.uk/HNHHzGR4