If I use Hive, the below works fine. But if I use Impala, it throws error:
select regexp_replace("foobarbarfoo","bar(?=bar)","<NA>");
WARNINGS: Could not compile regexp pattern: bar(?=bar)
Error: invalid perl operator: (?=
Basically, Impala doesn't support lookahead and lookbehind
Is there a workaround for this today? Maybe use UDF?
Thanks.
Since you are using regexp_replace
, match and capture the part of string you want to keep (but want to use as must-have context) and replace with a backreference. See the regexp_replace
Impala reference:
These examples show how you can replace parts of a string matching a pattern with replacement text, which can include backreferences to any
()
groups in the pattern string. The backreference numbers start at 1, and any\
characters must be escaped as\\
.
So, here, you may use
select regexp_replace("foobarbarfoo","bar(bar)","<NA>\\1");
^ ^ ^^^
Note it will not work to replace consecutive matches, however, it will work in the current scenario and foobarbarfoo
will turn into foo<NA>barfoo
(note that Go regex engine is also RE2, hence this option is chosen at regex101.com).