Search code examples
regexgosubdomain

Extract subdomain from URL using regexp in Golang


In the code sample below, I use regex to extract the subdomain name from a given URL. This sample works, but I don't think I've done it correctly at the point where I compile the regex, mainly where I insert the 'virtualHost' variable. Any suggestions?

package main

import (
    "fmt"
    "regexp"
)

var (
    virtualHost string
    domainRegex *regexp.Regexp
)

func extractSubdomain(host string) string {
    matches := domainRegex.FindStringSubmatch(host)
    if matches != nil && len(matches) > 1 {
        return matches[1]
    }
    return ""
}

func init() {
    // virtualHost = os.GetEnv("VIRTUAL_HOST")
    virtualHost = "login.localhost:3000"

    domainRegex = regexp.MustCompile(`^(?:https?://)?([-a-z0-9]+)(?:\.` + virtualHost + `)*$`)
}

func main() {
    // host := req.host
    host := "http://acme.login.localhost:3000"

    if result := extractSubdomain(host); result != "" {
        fmt.Printf("Subdomain detected: %s\n", result)
        return
    }

    fmt.Println("No subdomain detected")
}

Solution

  • The url package has a function parse that allows you to parse an URL. The parsed URL instance has a method Hostname which will return you the hostname.

    package main
    
    import (
        "fmt"
        "log"
        "net/url"
    )
    
    func main() {
        u, err := url.Parse("http://login.localhost:3000")
        if err != nil {
            log.Fatal(err)
        }
        fmt.Println(u.Hostname())
    }
    

    Output:

    login.localhost
    

    See https://play.golang.com/p/3R1TPyk8qck

    Update:

    My previous answer only dealt with parsing the host name. Since then I have been using the following library to parse the domain suffix from the host name. Once you have that, it is simple to strip the domain and leave only the subdomain prefix.

    https://pkg.go.dev/golang.org/x/net/publicsuffix

    I have found that it can be a bit tricky to exactly identify the difference between subdomain and host, without a little help first from this package that can identify common suffixes. For instance, internally we may have a domain coming from a kubernetes ingress:

    foo.bar.host.kube.domain.com.au
    

    The host is "host" and the subdomain is "foo.bar". Even with the help of the publicsuffix library it won't know that "kube" is part of the internal domain components. So you have to add some more of your own hinting to match.