Search code examples
regexapache-sparkregex-groupregexp-replacemetacharacters

Dangling metacharacter * sparksql


Below regex works in Hive but not in Spark.

It throws an error dangling metacharacter * at index 3:

select regexp_extract('a|b||c','^(\\|*(?:(?!\\|\\|\\w(?!\\|\\|)).)*)');

I also tried escaping * with \\* but still it throws dangling metacharacter * at index 3.


Solution

  • You can use

    regexp_replace(col, '^(.*)[|]{2}.*$', '$1')
    

    See the regex demo.

    Regex details:

    • ^ - start of string
    • (.*) - Capturing group 1 (this group value is referred to with $1 replacement backreference in the replacement pattern): any zero or more chars other than line break chars, as many as possible (the rest of the line)
    • [|]{2} - double pipe (|| string)
    • .* - the rest of the line
    • $ - end of string.