Search code examples
regexstringhivehiveqlnumeric

Hive extract numeric value from a string


I have a table as:

column1                         column2 
The first value is 200 gb        need to restart (2 times)
The 2nd value is 700 gb          need (optional) to restart (10 times)

I am trying to get the numeric values from the table. The expected output is

column1_numeric      column2_numeric 
200                   2
700                   10

For column1: i trying to get data using: regexp_replace(column1, '[^0-9]', '') as column1_numeric; but this is not working for the 2nd row and returns 2700

For column2: I am trying as: regexp_replace(regexp_extract(column2,'\\((.*?)\\)'), '[^0-9]', '') as column2_numeric but this is also not working for 2nd row and returns null values

Any suggestions please?


Solution

  • Extract last numeric value from the string '(\\d+)([^0-9]*)$':

    select 
          regexp_extract(column1,'(\\d+)([^0-9]*)$',1) as column1_numeric,
          regexp_extract(column2,'(\\d+)([^0-9]*)$',1) as column2_numeric
       ...
    

    It extracts

    column1_numeric      column2_numeric 
    200                   2
    700                   10
    

    Also instead of [^0-9] (not a digit) you can use \\D, which is a bit shorter:

    '(\\d+)(\\D*)$'