Search code examples
regexiisregex-lookaroundsregex-groupregex-greedy

RegEx for failing subdomains


Basically, I would like to check a valid URL that does not have subdomain on it. I can't seem to figure out the correct regex for it.

Example of URLs that SHOULD match:

  • example.com
  • www.example.com
  • example.co.uk
  • example.com/page
  • example.com?key=value

Example of URLs that SHOULD NOT match:

  • test.example.com
  • sub.test.example.com

Solution

  • Here, we would start with an expression which is bounded on the right with .com or .co.uk and others, if desired, then we would swipe to left to collect all non-dot chars, add an optional www and https, then we would add a start char ^ which would fail all subdomains:

    ^(https?:\/\/)?(www\.)?([^.]+)(\.com|\.co\.uk)(.+|)$
    

    Other TLDs can be added to this capturing group:

    (\.com|\.co\.uk|\.net|\.org|\.business|\.edu|\.careers|\.coffee|\.college)
    

    And the expression can be modified to:

    ^(https?:\/\/)?(www\.)?([^.]+)(\.com|\.co\.uk|\.net|\.org|\.business|\.edu|\.careers|\.coffee|\.college)(.+|)$
    

    Flexibility

    I can't think of something to make the TLDs too flexible, since this is a validation expression. For instance, if we would simplify it to:

    ^(https?:\/\/)?(www\.)?([^.]+)(\.[a-z]+)(\.uk?)?[a-z?=\/]+$
    

    it might work for the URLs listed in the question, but it would also pass:

    example.example
    

    which is invalid. We can only use this expression:

    ^(https?:\/\/)?(www\.)?([^.]+)(\.[a-z]+)(\.uk?)?[a-z?=\/]+$
    

    if we would know that what we pass, it is already a URL.

    NOT FUNCTIONAL DEMO

    Demo

    This snippet just shows that how the capturing groups work:

    const regex = /^(https?:\/\/)?(www\.)?([^.]+)(\.com|\.co\.uk)(.+|)$/gm;
    const str = `example.com
    www.example.com
    example.co.uk
    example.com/page
    example.com?key=value
    
    test.example.com
    sub.test.example.com`;
    let m;
    
    while ((m = regex.exec(str)) !== null) {
        // This is necessary to avoid infinite loops with zero-width matches
        if (m.index === regex.lastIndex) {
            regex.lastIndex++;
        }
        
        // The result can be accessed through the `m`-variable.
        m.forEach((match, groupIndex) => {
            console.log(`Found match, group ${groupIndex}: ${match}`);
        });
    }

    RegEx Circuit

    jex.im visualizes regular expressions:

    enter image description here

    RegEx

    If this expression wasn't desired, it can be modified/changed in regex101.com.

    DEMO