Search code examples
javascriptregexgoogle-chrome-extension

How to exclusively detect subdomains of a URL with a regular expression


I am making a chrome extension that is given a list of domains that needs to be compared against the active URL of a tab. For example if the list of domains has "google" then the extension should detect "docs.google.com" as part of the domain list. I have gotten this part to work. The issue is when the domain list contains a subdomain. For example: if "docs.google" is on the list then if the user is on "google.com" the extension should not recognize this as a URL on the domain list.

I am attempting this by constructing a regular expression. for each domain and subdomain. As I said, when you are given a domain (as opposed to a subdomain) it works properly although I have tested this with subdomains and it does not seem to work. I assume the issue is with how I constructed the RegEx. Anything that stands out? thank you in advance!

let onDomainList = false;
for(let i = 0; i < domainListLength-1; i++){
                if(!domainList[i].includes(".")){ //if this domain is not a subdomain
                    let strPattern = "^https://www\\." + list.domainList[i].replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + "|https://[a-z_]+\\." + list.domainList[i].replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
                    let domainRegEx = new RegExp(strPattern,'i');
                    if(domainRegEx.test(activeTab.url)){
                        onDomainList = true;
                        execute_script(activeTab);
                    }
                } else{ //if this domain is a subdomain
                    let strPattern = "^https://www\\." + list.domainList[i].replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
                    let domainRegEx = new RegExp(strPattern,'i');
                    if(domainRegEx.test(activeTab.url)){
                        onDomainList = true;
                        execute_script(activeTab);
                    }
                }
            }

EDIT: Changed RegEx to what Wiktor Stribizew suggested, although still the issue of not detecting subdomains.


Solution

  • Here is a fixed snippet:

    let onDomainList = false;
    for (let i = 0; i < domainListLength - 1; i++) {
      if (!domainList[i].includes(".")) { //if this domain is not a subdomain
        let strPattern =
          let strPattern = "^https://www\\." + domainList[i].replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + "|https://[a-z_]+\\." + domainList[i].replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
        let domainRegEx = new RegExp(strPattern, 'i');
        if (domainRegEx.test(activeTab.url)) {
          onDomainList = true;
          execute_script(activeTab);
        }
      } else { //if this domain is a subdomain
        let strPattern = "^https://(?:[^\\s/]*\\.)?" + list.domainList[i].replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
        let domainRegEx = new RegExp(strPattern, 'i');
        if (domainRegEx.test(activeTab.url)) {
          onDomainList = true;
          execute_script(activeTab);
        }
      }
    }
    

    Notes:

    • Since you are using a RegExp constructor notation, and define the regex with a regular string literal, you need to properly introduce backslashes used to escape special chars. Here, there is no need to escape / and the . needs two backslashes, the "\\." string literal is actually a \. text
    • The variable texts need escaping to be used properly in the code, hence domainList[i].replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')
    • The / before ^ renders the regex useless since there can be no / before the start of string, and thus /^ is a regex that never matches any string. / as regex delimiters should not be used in RegExp constructor notation
    • A subdomain regex does not actually match anything but https://www. + the domain from your list. To allow anything before the domain, you can replace www\. with (?:[^\s/]*\.)? that matches an optional sequence ((?:...)? is an optional non-capturing group) of zero or more chars other than whitespace and / (with the [^\/s]* negated character class) and then a dot.