Search code examples
sqlregexprestotrino

In Presto SQL, for string column, how to remove text starting from a pattern in each string?


I'm trying to do some string cleanup and wanted to keep only the part that's before a given pattern and remove everything including the pattern and everything after. The pattern is 'hi {{user_name}}'. For example:

Input: What a good day. hi {{user_name}} how are you

Expected output: What a good day.

Input: What a good day. hi {{user_name}} let me know how I can help.

Expected output: What a good day.

Input: What a great day!

Expected output: What a great day!

I tried to do split by pattern and keep only the first part of the split result, but wasn't able to find a good way in SQL


Solution

  • You can try using regex_replace with hi {{user_name}}.* pattern (explanation at regex101):

    -- sample data
    with dataset(str) as(
        values
            ('What a good day. hi {{user_name}} how are you'),
            ('What a good day. hi {{user_name}} let me know how I can help.'),
            ('What a great day!')
    
    )
    
    -- query 
    SELECT regexp_replace(str, 'hi {{user_name}}.*', '')
    FROM dataset;
    

    Output (note that this will keep the space at the end):

    _col0
    What a good day.
    What a good day.
    What a great day!