Search code examples
hivehiveqlregexp-replace

Remove all characters between two substrings in Hive SQL query


I have a column of strings that looks like this:

STRING:SECTION1/SECTION2/0000123456789/SECTION3/SECTION4 STRING:SECTION1/SECTION2/0000987654321/SECTION3/SECTION4 STRING:SECTION1/SECTION2/00005552121X/SECTION3/SECTION4 STRING:SECTION1/SECTION2/00005552222:ID/SECTION3/SECTION4

I am trying to use REGEXP_REPLACE to replace the variable length, alpha/num/special char string from the middle and replace it with something generic so that they all look like this:

STRING:SECTION1/SECTION2/id_number_removed/SECTION3/SECTION4

I have been trying all morning to try to find the right regex expression to replace everything between '/SECTION2/' and '/SECTION3/' but have had no success.


Solution

  • Replace regex pattern 'SECTION2/[^/]+/SECTION3' with 'SECTION2/id_number_removed/SECTION3'. [^/]+ means 1 or more characters that are not slashes.

    select regexp_replace(
        'STRING:SECTION1/SECTION2/00005552222:ID/SECTION3/SECTION4',
        'SECTION2/[^/]+/SECTION3', 
        'SECTION2/id_number_removed/SECTION3');
    

    which gives

    STRING:SECTION1/SECTION2/id_number_removed/SECTION3/SECTION4