I have three fields: clean_email, email, and email_domain
Clean email: is the information before the domain. So if an email was dataguy@yahoo.com, this field would just say dataguy
Email is the whole email: dataguy@yahoo.com
email_domain is just the domain: yahoo.com
I am needing to Count the number clean emails that have different domains. We are noticing that some emails may be dataguy@yahoo.com, dataguy@hotmail.com, or dataguy@outlook.com. You notice the email is the same, but the domain is different, so we are trying to recognize when this happens. The total count of domains for this individual would be 3, and I need those specific domains listed out.
My query currently is:
SELECT clean_email, email, COUNT(DISTINCT email_domain)
FROM email
GROUP BY clean_email, email
I have tried using COUNT
in multiple ways, but it's not returning what I need. It usually returns 1 row.
You can use substring_index()
:
SELECT substring_index(email, '@', 1) as clean_email,
COUNT(DISTINCT substring_index(email, '@', -1))
FROM email
GROUP BY clean_email;
EDIT:
If you want the domains, then use GROUP_CONCAT()
:
SELECT substring_index(email, '@', 1) as clean_email,
COUNT(DISTINCT substring_index(email, '@', -1)),
GROUP_CONCAT(DISTINCT substring_index(email, '@', -1))
FROM email
GROUP BY clean_email
HAVING COUNT(DISTINCT substring_index(email, '@', -1)) > 1;