Search code examples
regexoracle-database

Extracting with regex two conditions of some text


My code doesn't work:

 regexp_substr('Lorem ipsum dolor sit amet. consectetur', '([^(.|()]+)|((.){0,9})')

The text should end with a dot, and if it does not have a dot, then it should have a maximum of 10 characters. Is it even possible to do this?

Two examples text:

  1. Lorem ipsum dolor sit amet. consectetur
  2. Donec quis turpis sed sapien ullamcorper viverra sodales a est

This is what it should look like

  1. Lorem ipsum dolor sit amet
  2. Donec quis

Solution

  • You can use a replacing approach here:

    REGEXP_REPLACE('Lorem ipsum dolor sit amet. consectetur',
                          '^([^.]+)\..*|^(.{10}).*',
                          '\1\2') 
    

    See this regex demo.

    Details:

    • ^([^.]+)\..* - if a string has a dot, capture the text before it, and then match the dot with the rest of the string:
      • ^ - start of string
      • ([^.]+) - Group 1 (\1): any one or more chars other than .
      • \. - a dot
      • .* - any zero or more chars as many as possible
    • | - or
    • ^(.{10}).* - match and capture (into Group 2) the 10 chars ((.{10})) at the beginning of the string (^), then match the rest of the string.

    The replacement is two backreferences to the captured values.