I have a requirement in hive, where need to replace the non-alphanumeric characters with _. The data in column is "Pranav_Wagde_job_1163_W.Pranav Ltr-Resume 04.07.2016.pdf"
SELECT FILENAME, REGEXP_REPLACE(FILENAME,"[^0-9a-zA-Z._]", "_"), FROM HIVEDB.FILEDATA WHERE FILENAME='Pranav Wagde_job_1163_W.Pranav Ltr-Resume 04.07.2016.pdf';
The output is
Pranav_Wagde_job_1163_W.Pranav Ltr-Resume 04.07.2016.pdf
I am unable to replace the dots in the file name. How to replace all the dots in data except the extension of the file?
If a positive lookahead is supported, you could add an alternation |
matching all dots except one that has 1+ word chars at the right followed by the end of the string.
(?:[^0-9a-zA-Z._]|\.(?!\w+$))