Search code examples
regexhivehiveqlregexp-replace

How to remove multiple characters at once using regexp_replace() in Hive?


I'm trying to cleanup my data in a Hive table. I need to replace some characters in a column but I'm unable to figure out how to remove multiple characters at once in using regexp_replace() in Hive SQL.

The below is straightforward and works as expected:

select regexp_replace('abc-de-ghi', '-','');

and outputs: abcdefghi

But I don't know how to clean up a string with different characters in it:

select regexp_replace('abc-de/ghi@jkl:mn#op', <i-dont-know-what-goes-here>,'');

Can someone please help me with this?


Solution

  • Use '[-/@:#]' template with character set (in the brackets) you want to remove:

    select regexp_replace('abc-de/ghi@jkl:mn#op','[-/@:#]','');
    

    Result:

    OK
    abcdeghijklmnop
    Time taken: 4.656 seconds, Fetched: 1 row(s)