I have a query to extract the email domain from a column that maintained with multiple email addresses. The column value is aaa@yahoo.com, bbb@hotmail.com, and the column size is VARCHAR2(3500 CHAR)
Below query used listagg function to extract the domains and prefixed with regular expression:
SELECT DISTINCT
LISTAGG( CHR(91)||'a-zA-Z0-9._%-'||CHR(93)||'+' ||REGEXP_SUBSTR(EMAIL_STRING,'@' || CHR(91) || '^,'
|| CHR(93) ||'+', 1, LEVEL), ',')
WITHIN GROUP (ORDER BY LEVEL) AS USER_EMAILS_DOMAIN
FROM EMAIL_PARAMETER WHERE NAME='UserEmails'
CONNECT BY REGEXP_SUBSTR(EMAIL_STRING,'@'||CHR(91)||'^,'||CHR(93)||'+', 1, LEVEL) IS NOT NULL;
Expected outcome: [a-zA-Z0-9.%-]+@yahoo.com,[a-zA-Z0-9.%-]+@hotmail.com
The string is within 4000 characters, however Oracle return ORA-01489: result of string concatenation is too long
I tried to use LISTAGG
with ON OVERFLOW TRUNCATE
option, it returns a long string with the second email, bbb@hotmail.com keeps on repeating.
I also try on XMLTAGG
, also I'm getting a very long string with both emails are repeating.
select rtrim(xmlagg(xmlelement(e,CHR(91)||'a-zA-Z0-9._%-'||CHR(93)||'+' || EMAIL_STRING,',').extract('//text()') order by REGEXP_SUBSTR(EMAIL_STRING,'@' || CHR(91) || '^,' || CHR(93) ||'+', 1, LEVEL)).getclobval(),',') USER_EMAILS_DOMAIN
FROM EMAIL_PARAMETER WHERE NAME='UserEmails'
CONNECT BY REGEXP_SUBSTR(EMAIL_STRING,'@'||CHR(91)||'^,'||CHR(93)||'+', 1, LEVEL) IS NOT NULL;
It looks as if DISTINCT
doesn't do what you hoped it would.
First select all distinct domains, then apply listagg
to the result.
Simplified:
select listagg(domain, ',') within group (order by domain) final_result --> 2. aggregation
from
(select distinct regexp_substr(email_string, ...) as domain --> 1. distinct domains
from email_parameter
)