Below regex works in Hive but not in Spark.
It throws an error dangling metacharacter * at index 3
:
select regexp_extract('a|b||c','^(\\|*(?:(?!\\|\\|\\w(?!\\|\\|)).)*)');
I also tried escaping *
with \\*
but still it throws dangling metacharacter * at index 3
.
You can use
regexp_replace(col, '^(.*)[|]{2}.*$', '$1')
See the regex demo.
Regex details:
^
- start of string(.*)
- Capturing group 1 (this group value is referred to with $1
replacement backreference in the replacement pattern): any zero or more chars other than line break chars, as many as possible (the rest of the line)[|]{2}
- double pipe (||
string).*
- the rest of the line$
- end of string.