The regex purpose is to take a string and make a list of all hostnames, but only take the 3 dots from the right in it.
It works, but its possible to see that the script select the left of the hostname, not the right.
Regex
((([a-zA-Z0-9]{1,63}|[a-zA-Z0-9][a-zA-Z0-9-]{0,61}[a-zA-Z0-9])\.){1,3}[a-zA-Z]{2,63})
Now
site.com 1.site.com 2.1.site.com 3.2.1.site.com 4.3.2.1.site.com 5.4.3.2.1.site.com
Fixed
site.com 1.site.com 2.1.site.com 3.2.1.site.com 4.3.2.1.site.com 5.4.3.2.1.site.com
If you want to use your regex for that, you need to limit the +
with just {0,3}
occurrences, and use a \b
word boundary (?!\.)
lookahead at the end to make sure we match the strings at the trailing word boundary and there should no be a dot after it:
(([a-zA-Z0-9]{1,63}|[a-zA-Z0-9][a-zA-Z0-9-]{0,61}[a-zA-Z0-9])\.){0,3}[a-zA-Z]{2,63}\b(?!\.)
^^^^^ ^^^^^^^^
See the regex demo
Note that +
quantifier matches 1 or more occurrences of the quantified subpattern and {0,3}
limiting (bound) quantifier allows matching o to 3 occurrences only.
In C++, you may use a raw string literal (R"(<PATTERN>)"
) to define the regex to avoid overescaping:
std::regex rx(R"((([a-zA-Z0-9]{1,63}|[a-zA-Z0-9][a-zA-Z0-9-]{0,61}[a-zA-Z0-9])\.){0,3}[a-zA-Z]{2,63}\b(?!\.))");