Search code examples
sqloracle-databaseplsqlregexp-replace

Removing one word in a string (or between two white spaces)


I have this:

Dr. LeBron Jordan
John Bon Jovi

I would like this:

Dr. Jordan
John Jovi

How do I come about it? I think it's regexp_replace.

Thanks for looking. Any help is much appreciated.


Solution

  • Here's a way using regexp_replace as you mentioned, using several forms of a name for testing. More powerful than nested SUBSTR(), INSTR() but you need to get your head around regular expressions, which will allow you way more pattern matching power for more complex patterns once you learn it:

    with tbl as (
      select 'Dr. LeBron Jordan' data from dual
      union
      select 'John Bon Jovi' data from dual
      union
      select 'Yogi Bear' data from dual
      union
      select 'Madonna' data from dual 
      union
      select 'Mr. Henry Cabot Henhouse' data from dual  ) 
    
    select regexp_replace(data, '^([^ ]*) .* ([^ ]*)$', '\1 \2') corrected_string from tbl;
    
    CORRECTED_STRING
    ----------------
    Dr. Jordan
    John Jovi
    Madonna
    Mr. Henhouse
    Yogi Bear
    

    The regex can be read as:

    ^      At the start of the string (anchor the pattern to the start)
    (      Start remembered group 1
    [^ ]*  Zero or more characters that are not a space
    )      End remembered group 1
    space  Where followed by a literal space
    .      Followed by any character
    *      Followed by any number of the previous any character
    space  Followed by another literal space
    (      Start remembered group 2
    [^ ]*  Zero or more characters that are not a space
    )      End remembered group 2
    $      Where it occurs at the end of the line (anchored to the end)
    

    Then the '\1 \2' means return remembered group 1, followed by a space, followed by remembered group 2.

    If the pattern cannot be found, the original string is returned. This can be seen by surrounding the returned groups with square brackets and running again:

    ...
    select regexp_replace(data, '^([^ ]*) .* ([^ ]*)$', '[\1] [\2]')
    corrected_string from tbl;
    
    CORRECTED_STRING
    [Dr.] [Jordan]
    [John] [Jovi]
    Madonna
    [Mr.] [Henhouse]
    Yogi Bear