Search code examples
regexsubstringamazon-redshift

Regular expression in Redshift that returns all words prior to " at" in a string


Using REGEX_SUBSTR() in redshift I want to search a string and extract all words that precede " at" in a string. the space before the at is key.

I tested [^.]*(?=\sat) on regex tester it works.

But inside Redshift I continually get errors that the regex expression cannot be interpreted. Consistently getting this error

I've tried using double backslash (\\s) but no luck with that either. For some reason inside the positive lookahead assertion Redshift is not recognizing any escape characters.

error:  Invalid preceding regular expression prior to repetition operator.  The error occurred while parsing the regular expression: '[^.]*(?>>>HERE>>>=\sat)'.
  code:      8002
  context:   T_regexp_init
  query:     0
  location:  sql_regex_funcs.hpp:175

In a perfect world running

SELECT regexp_substr('withdrawal from Credit Union at main avenue', '[^.]*(?=\\sat)')

would return withdrawal from Credit Union


Solution

  • You can use

    SELECT regexp_substr('withdrawal from Credit Union at main avenue', '([^.]*)\\sat', 1, 1, 'e')
    

    Here,

    • ([^.]*)\sat matches zero or more chars other than . as many times as possible up to a whitespace and at substring
    • 1, 1, 'e' mean that the search starts at the first character in the input string, the first occurrence will be returned and the substring returned will be the first capturing group value, i.e. what was matched with the first and only parenthesized regex part, ([^.]*).