have the following string (it's a salesforce query, but not important):
IF(OR(CONTAINS(EmailDomain,"yahoo"),CONTAINS(EmailDomain,"gmail"),
CONTAINS("protonmail.com,att.net,chpmail.com,smail.com",EmailDomain)),
"Free Mail","Business Email")
and I want to get an array of all substrings that are encapsulated between double quotes like so:
['yahoo',
'gmail',
'protonmail.com,att.net,chpmail.com,smail.com',
'Free Mail',
'Business Email']
in python I do:
re.findall(r'"(.+?)"', <my string>)
but is there a way to replicate this in Snowflake?
I've tried
SELECT
REGEXP_SUBSTR('IF(OR(CONTAINS(EmailDomain,"yahoo"),CONTAINS(EmailDomain,"gmail"),
CONTAINS("protonmail.com,att.net,chpmail.com,smail.com",EmailDomain)),
"Free Mail","Business Email")', '"(.+?)"') as emails;
but I get this:
"yahoo"),CONTAINS(EmailDomain,"gmail"
You can use
select split(trim(regexp_replace(regexp_replace(col, '"([^"]+)"|.', '\\1|'),'\\|+','|'), '|'), '|');
Details:
regexp_replace(col, '"([^"]+)"|.', '\\1|')
- finds any strings between the closest double quotes while capturing the part inside quotes into Group 1, or matching any single char and replaces each match with Group 1 contents + |
char (see the regex demo)regexp_replace(...,'\\|+','|')
- this shrinks all consecutive pipe symbols into a single occurrence of a |
char (see this regex demo)trim(..., '|')
- removes |
chars on both ends of the stringsplit(..., '|')
- splits the string with a |
char.