Search code examples
hiveregexp-replace

regexp_replace function in HIVE


How using the regexp_replace function in HIVE can I cut the markup from this string:

Abc abc ","<a href="http://,557244.html" id=" ">abc abc abc .</a> 

I want to get: Abc abc abc abc abc Does anyone know?


Solution

  • Assuming column WTF contains

    Abc abc ","<a href="http://,557244.html" id=" ">abc abc abc .</a>

    then regexp_replace(regexp_replace(WTF,'<[^>]*>',''), '[",.]','') removes all XML markup stuff, then punctuation, to return

    Abc abc abc abc abc

    That's plain old regular expression syntax, nothing specific to Hive.