Search code examples
sqlamazon-redshiftdata-warehouseregexp-substrlistagg

Redshift REGEXP_SUBSTR get last occurrence of a match


I have a list of page events of all types in a column value ordered by time asc that I got using listagg method. listagg(page,';') within group (order by time)

I want to get the occurrence of the last match that matches the regex regexp_substr(event_list,'/step[0-9]+[^;]*')

According to the docs "A positive integer that indicates the position within source_string to begin searching. The position is based on the number of characters, not bytes, so that multibyte characters are counted as single characters. The default is 1. If position is less than 1, the search begins at the first character of source_string. If position is greater than the number of characters in source_string, the result is source_string."

Based on this I need to know the exact occurrence number which I do not know. How to get the last match in this case? eg : /step1;somethging;somethig;/step2;something;/step3;something;

I want to match step3.

PS: Ordering by time desc and getting the first match is not an option here.


Solution

  • Use regexp_count to determine how many matches there are (n) & then use regexp_substr to get the nth match.

    select 
      '/step1;somethging;somethig;/step2;something;/step3;something;' string
    , '/step[0-9]+[^;]*' pat
    , regexp_count(string, pat) n
    , regexp_substr(string, pat, 1, n) last_part
    

    outputs:

                                                           string                pat    n    last_part
    /step1;somethging;somethig;/step2;something;/step3;something;   /step[0-9]+[^;]*    3       /step3
    

    if / could be treated as a delimiter, then you may alternatively employ the following strategy

    reverse the string, split by / & take the first part. reverse again, prefixing / and apply the regular expression to extract the step:

    example:

    select 
      '/step1;somethging;somethig;/step2;something;/step3;something;' string
    , '/' || reverse(split_part(reverse(string), '/', 1)) last_part
    , regexp_substr(last_part, '/step[0-9]+[^;]*') extract_step
    

    outputs:

                                                           string           last_part   extract_step 
    /step1;somethging;somethig;/step2;something;/step3;something;   /step3;something;         /step3