Search code examples
sqlregexoracle-databasecaseregexp-substr

Limit regexp_substr responses when unwanted characters appear in string (return null responses)


Attempting to limit responses when certain characters appear in the string- i.e. I want 'The cow went for a walk' However

1 if within 20 characters the word 'BAD' appears such as the "The BAD mean cow went for a walk" it would return a null value
2. The same if there was punctuation in the middle of the sentence, so that
     "The cow came back. But the dog went for a walk" would not return a value- 

Code below.

with test (id, col) as (
  select 1, 'The cow went for a walk'                         from dual union all --want this
  select 2, 'The BAD  mean cow went for a walk'                 from dual union all   --do not want this
  select 3, 'The cow came back. But the dog went for a walk'    from dual) --do not want this 
  select id, col, 
       regexp_substr(col,'(cow).{1,40}(walk)',1,1,'i') rs
regexp_substr(col,'(cow).{1,40}(([^.])(walk))',1,1,'i') rs2
      from test

RS2 is my first attempt w/o success to limit the response.


Solution

  • Here is a solution that uses two calls to regexp_like() to ensure that the value matches the criteria:

    select
        col,
        case 
            when 
                regexp_like(col, 'cow[^.]{1,40}walk', 'i')
                and not regexp_like(col, '^.{1,20}bad', 'i')
            then col
        end new_col
    from test
    

    The first function call ensures that the string contains the work 'cow' followed by 'walk' within 40 characters maximum and with no dot in between. The second call eliminates strings that contain the word 'bad' within the first 20 characters.

    Demo on DB Fiddle:

    with test (id, col) as (
        select 1, 'The cow went for a walk' from dual 
        union all 
        select 2, 'The BAD  mean cow went for a walk' from dual
        union all
        select 3, 'The cow came back. But the dog went for a walk' from dual
    )
    select
        col,
        case 
            when 
                regexp_like(col, 'cow[^.]{1,40}walk', 'i')
                and not regexp_like(col, '^.{1,20}bad', 'i')
            then col
        end new_col
    from test
    
    COL                                            | NEW_COL                
    :--------------------------------------------- | :----------------------
    The cow went for a walk                        | The cow went for a walk
    The BAD  mean cow went for a walk              | null                   
    The cow came back. But the dog went for a walk | null