Search code examples
sqlemailselectgoogle-bigqueryregexp-replace

How can I extract the domain from an url or email with a sql select statement?


I want is to extract only the domain, without the top domain.

I have two fields, email and website. With two different sql SELECT statements, one for each of the fields, I want to create two new fields, EmailDomain and WebsiteDomain.

Examples of fields in website:

Examples of values in email:

  • name@domain.stormgeo.com
  • name@domain.com
  • name@test.domain.com
  • name@domain.co.uk
  • etc.

In all the examples above, I want the result to be "domain" (or "example") only.

I have tested and come up with some statements that almost makes the job, but they don't fix all the examples and they still have the top domain.

What I have so far is:

SELECT Account.website, REGEXP_REPLACE (Account.website, '(http[s]?://)?(www\\.)?(.*?)((/|:)(.)*|$)', '\\3') AS `WebsiteDomain` FROM Account

and

SELECT Leads.email, REGEXP_REPLACE (Leads.email, '^.*@([^\\.]+)\\.\\w+','\\1') AS `EmailDomain` FROM Leads


Solution

  • Consider below

    select website, regexp_replace(net.reg_domain(website), '.' || net.public_suffix(website) || '$', '') 
    from `project.dataset.table`   
    

    if applied to sample data in your question - output is

    enter image description here

    You should be able to apply same to email field