Search code examples
c++regexhostname

Regex to match domains with 3 dots


The regex purpose is to take a string and make a list of all hostnames, but only take the 3 dots from the right in it.

It works, but its possible to see that the script select the left of the hostname, not the right.

Regex

((([a-zA-Z0-9]{1,63}|[a-zA-Z0-9][a-zA-Z0-9-]{0,61}[a-zA-Z0-9])\.){1,3}[a-zA-Z]{2,63})

Now

site.com 1.site.com 2.1.site.com 3.2.1.site.com 4.3.2.1.site.com 5.4.3.2.1.site.com

Fixed

site.com 1.site.com 2.1.site.com 3.2.1.site.com 4.3.2.1.site.com 5.4.3.2.1.site.com


Solution

  • If you want to use your regex for that, you need to limit the + with just {0,3} occurrences, and use a \b word boundary (?!\.) lookahead at the end to make sure we match the strings at the trailing word boundary and there should no be a dot after it:

    (([a-zA-Z0-9]{1,63}|[a-zA-Z0-9][a-zA-Z0-9-]{0,61}[a-zA-Z0-9])\.){0,3}[a-zA-Z]{2,63}\b(?!\.)
                                                                    ^^^^^              ^^^^^^^^
    

    See the regex demo

    Note that + quantifier matches 1 or more occurrences of the quantified subpattern and {0,3} limiting (bound) quantifier allows matching o to 3 occurrences only.

    In C++, you may use a raw string literal (R"(<PATTERN>)") to define the regex to avoid overescaping:

    std::regex rx(R"((([a-zA-Z0-9]{1,63}|[a-zA-Z0-9][a-zA-Z0-9-]{0,61}[a-zA-Z0-9])\.){0,3}[a-zA-Z]{2,63}\b(?!\.))");