Search code examples
c++regexurlboost-regexauthority

Regex matching url authority parts


I need to match these parts of the following string:

(user)@(hostname):(port)

User and port can optionally be matched. First I managed it with this regular expression:

(?:([^@]*)@)?([^\:]+)(?:\:(\d+))?

This matches for foo@bar:80

foo
bar
80

But when it comes to a IPv6 host like foo@[2001:0db8:85a3:08d3:1319:8a2e:0370:7344]:80, the preceding regex won't work as expected:

foo
[2001
0

So now I'm pondering about a regular expression which can also match square bracket enclosed hosts with colons, but without square brackets. :) I've done that with the following regex:

(?:([^@]*)@)(?:\[(.+)\]|([^:]+))(?:\:(\d+))?

foo
2001:0db8:85a3:08d3:1319:8a2e:0370:7344
<empty>
80

But.. this is ugly, because either 2 or 3 will be empty. Is there any way to combine this to only one backreference?

I'm using boost::regex, which uses perl's regex engine as far as I know.

Thanks and regards

reeaal


Solution

  • (?:([^@]*)@)(\[.+\]|([^:]+))(?:\:(\d+))?
    

    But you'll have to strip off the [] if it's an IPv6 addr. Should be fairly trivial though.

    You could also do it with optional [ and ] before and after, and then lookaround assertions... but that's REALLY ugly; your fellow programmers will thank you if you just KISS and use the above, but here's the option:

    (?:([^@]*)@)\[?((?<=\[).+(?=\])|([^:]+))\]?(?:\:(\d+))?