How using the regexp_replace function in HIVE can I cut the markup from this string:
Abc abc ","<a href="http://,557244.html" id=" ">abc abc abc .</a>
I want to get: Abc abc abc abc abc Does anyone know?
Assuming column WTF contains
Abc abc ","<a href="http://,557244.html" id=" ">abc abc abc .</a>
then regexp_replace(regexp_replace(WTF,'<[^>]*>',''), '[",.]','')
removes all XML markup stuff, then punctuation, to return
Abc abc abc abc abc
That's plain old regular expression syntax, nothing specific to Hive.