Search code examples
hiveregexp-replace

How to Replace special characters in hive columns?


I have a requirement in hive, where need to replace the non-alphanumeric characters with _. The data in column is "Pranav_Wagde_job_1163_W.Pranav Ltr-Resume 04.07.2016.pdf"

SELECT FILENAME, REGEXP_REPLACE(FILENAME,"[^0-9a-zA-Z._]", "_"), FROM HIVEDB.FILEDATA WHERE FILENAME='Pranav Wagde_job_1163_W.Pranav Ltr-Resume 04.07.2016.pdf';

The output is

Pranav_Wagde_job_1163_W.Pranav Ltr-Resume 04.07.2016.pdf

I am unable to replace the dots in the file name. How to replace all the dots in data except the extension of the file?


Solution

  • If a positive lookahead is supported, you could add an alternation | matching all dots except one that has 1+ word chars at the right followed by the end of the string.

    (?:[^0-9a-zA-Z._]|\.(?!\w+$))
    

    Regex demo