Search code examples
regexhiveapache-pighiveql

How to exclude special characters in a string using regular expressions in hive


I want to exclude periods(.) and braces ((,)).
However, decimal numbers should be left intact

So basically if the input is

Hive supports subqueries only in the FROM clause (through Hive 0.12). The subquery has to be given a name because every table in a FROM clause must have a name. Columns in the subquery select list must have unique names.

The output should be

Hive supports subqueries only in the FROM clause through Hive 0.12 The subquery has to be given a name because every table in a FROM clause must have a name Columns in the subquery select list must have unique names


Solution

  • with t as (select 'Hive supports subqueries only in the FROM clause (through Hive 0.12). The subquery has to be given a name because every table in a FROM clause must have a name. Columns in the subquery select list must have unique names.' as mycol)
    
    select  regexp_replace(mycol,'(\\d+\\.\\d+)|[.()]','$1'),'\\((.*?)\\)'
    from    t
    

    Hive supports subqueries only in the FROM clause through Hive 0.12 The subquery has to be given a name because every table in a FROM clause must have a name Columns in the subquery select list must have unique names