Search code examples
regexhadoophivepresto

Hive query to extract a column which has alphanumeric characters


I have a requirement in which I need to extract the data based on a filter on a column and the filter would be to extract only alphanumeric values which means that it should contain at least one alphabet and a number for consideration. For example if I have five numbers such as 333,abc,ab333,+33,+ab33 the output should have only ab333 and +ab33. I was trying to implement this using the rlike function and the query is as below but this is giving all the records in the table.

select column_name from table_name where column_name rlike '^[a-zA-Z0-9]+$';

I also tried a different approach by using the below query but in case of special characters such as + the below query gives the wrong result.

select column_name from table_name where column_name not rlike '^[0-9]+$';

Could anybody guide me regarding the mistake of if there is a different approach for this.


Solution

  • You can use

    RLIKE '^\\+?(?:[0-9]+[a-zA-Z]|[a-zA-Z]+[0-9])[0-9a-zA-Z]*$'
    

    Details:

    • ^ - start of string
    • \+? - an optional + symbol
    • (?:[0-9]+[a-zA-Z]|[a-zA-Z]+[0-9]) - one or more digits followed with a letter or one or more letters followed with a digit and then
    • [0-9a-zA-Z]* - zero or more alphanumeric chars
    • $ - end of string.