Search code examples
sqlamazon-redshiftregexp-substr

SQL extracting date substring using regexp_subtr


I've got a field named signal_notes containing strings like the follow (this would be a single value of signal_notes):

"{ ^search_date^: ^2021-01-05^, 
^filing_date^: ^^, 
^expiry_date^: ^^, 
^other_liens^: ^^, 
^who_1st_positon^: ^^, 
^who_2nd_position^: ^^, 
^who_3rd_position^: ^^, 
^priority_from_1^: ^^, 
^priority_from_2^: ^^, 
^priority_from_3^: ^^, 
^notes^: ^^ 
^client_facing_notes^: ^^ 
 }"

Sometimes, the ^expiry_date^ line will have a date between the ^'s in the format 'YYYY-MM-DD'.

My new field expiry_date will ideally be in the format 'YYYY-MM-DD' with the date string from the signal_notes field.

This is what I've got so far, but it returns nothing.

select
(regexp_substr(signal_notes, 'expiry_date [0-9-]*' )) as expiry_date
from db

Ive also tried

(regexp_substr( signal_notes, '^expiry_date^: ^[0-9-]*^' )) as first_as_of_date_context

with the same results.

Any advice welcome


Solution

  • If I understand correctly, you want a subexpression. The ^ is a pain, one way around that is:

    regexp_substr(signal_notes, '.expiry_date.: .([0-9-]*).', 1, 1, 'e')
    

    This is very similar to your last attempt, except it has the subexpression so it should only return the date.

    You should also be able to use \\ as an escape character:

    regexp_substr(signal_notes, '\\^expiry_date\\^: \\^([0-9-]*)\\^', 1, 1, 'e')