Basically, I would like to check a valid URL that does not have subdomain on it. I can't seem to figure out the correct regex for it.
Example of URLs that SHOULD match:
Example of URLs that SHOULD NOT match:
Here, we would start with an expression which is bounded on the right with .com
or .co.uk
and others, if desired, then we would swipe to left to collect all non-dot chars, add an optional www
and https
, then we would add a start char ^
which would fail all subdomains:
^(https?:\/\/)?(www\.)?([^.]+)(\.com|\.co\.uk)(.+|)$
Other TLDs can be added to this capturing group:
(\.com|\.co\.uk|\.net|\.org|\.business|\.edu|\.careers|\.coffee|\.college)
And the expression can be modified to:
^(https?:\/\/)?(www\.)?([^.]+)(\.com|\.co\.uk|\.net|\.org|\.business|\.edu|\.careers|\.coffee|\.college)(.+|)$
I can't think of something to make the TLDs too flexible, since this is a validation expression. For instance, if we would simplify it to:
^(https?:\/\/)?(www\.)?([^.]+)(\.[a-z]+)(\.uk?)?[a-z?=\/]+$
it might work for the URLs listed in the question, but it would also pass:
example.example
which is invalid. We can only use this expression:
^(https?:\/\/)?(www\.)?([^.]+)(\.[a-z]+)(\.uk?)?[a-z?=\/]+$
if we would know that what we pass, it is already a URL.
This snippet just shows that how the capturing groups work:
const regex = /^(https?:\/\/)?(www\.)?([^.]+)(\.com|\.co\.uk)(.+|)$/gm;
const str = `example.com
www.example.com
example.co.uk
example.com/page
example.com?key=value
test.example.com
sub.test.example.com`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
jex.im visualizes regular expressions:
If this expression wasn't desired, it can be modified/changed in regex101.com.