Search code examples
oracle-databaseoracle11goracle10g

Oracle SQL Developer Regex expressions


I am not an expert in regex expression and looking for help. Thanks in advance

I want to extract a formatted substring from a description column. Sample below

FROM

my testing on 456897 - Carol M. Smith, Ph.D. 
my testing on 435670 - Ms. Paulina M. Hall
my testing on 980765 - Mr. John Smith
my testing on 14567 - Mrs. Lena C. Callum
my testing on 555777 - Dr. Paul F. Fairlake
234567 - Mr. Ryan M. Palmer, Sr.
123456 - Joyce R. Hilton, Ph.D.

TO

my testing on 456897 - C.Smith 
my testing on 435670 - Ms. P. Hall
my testing on 980765 - Mr. J. Smith
my testing on 14567 - Mrs. L. Callum
my testing on 555777 - Dr. P. Fairlake
234567 - Mr. R. Palmer
123456 - J. Hilton

My query works for the first and last record. However, the ones with titles are a little bit complicated.

For the records with title, I need to keep the title the first name and the Last Name initial.

SELECT description,
       CASE
           WHEN REGEXP_LIKE(description, '(Mr\.|Ms\.|Mrs\.|Dr\.)') THEN REGEXP_REPLACE(description, '(Ms\.|Mr\.|Mrs\.|Dr\.[A-Z][a-z]+ [A-Z]\.)')
           WHEN NOT REGEXP_LIKE(description, '(Mr\.|Ms\.|Mrs\.|Dr\.)')  THEN REGEXP_REPLACE(description, '(\w)\w*\W+(\w)\w*\W+(\w+),.*', '\1. \3')
           ELSE 'some other validation needed'
       END AS order_regex
       from mytable;

Again thanks for any suggestion. K


Solution

  • I would do it like this:

    select
      t1.*
     ,regexp_replace(
         t1.description
        ,'([^-]+)-\s*((Mr|Ms|Mrs|Dr)[.]\s*)?(\w)\w*(\s[a-zA-Z.]*)*\s(\w+)(,.*|$)'
        ,'\1- \2\4. \6'
        ) subs
    from t1
    

    Short description of this regular expression:

    1. ([^-]+)- - finds the first part of a substring ending with - (subexpression #1)
    2. \s* - any number of space characters
    3. ((Mr|Ms|Mrs|Dr)[.]\s*)? - checks if Mr.|Ms.|Mrs.|Dr. exists and returns as subexpression #2
    4. (\w)\w* - finds a name and returns the first letter as subexpression $3
    5. (\s[a-zA-Z.]*)* - any number of words between the first name and the last name (subexpression #4)
    6. \s(\w+)(,.*|$) - finds last name (i.e., the last word before ',' or the end of string) and returns as subexpression #5.

    Full test case:

    with t1 as (
    select 'my testing on 456897 - Carol M. Smith, Ph.D. ' description from dual union all
    select 'my testing on 435670 - Ms. Paulina M. Hall' from dual union all
    select 'my testing on 980765 - Mr. John Smith' from dual union all
    select 'my testing on 14567 - Mrs. Lena C. Callum' from dual union all
    select 'my testing on 555777 - Dr. Paul F. Fairlake' from dual union all
    select '234567 - Mr. Ryan M. Palmer, Sr.' from dual union all
    select '123456 - Joyce R. Hilton, Ph.D.' from dual  
    )
    select
      t1.*
     ,regexp_replace(
         t1.description
        ,'([^-]+)-\s*((Mr|Ms|Mrs|Dr)[.]\s*)?(\w)\w*(\s[a-zA-Z.]*)*\s(\w+)(,.*|$)'
        ,'\1- \2\4. \6'
        ) subs
    from t1;
    

    DBFiddle: https://dbfiddle.uk/HNHHzGR4