Search code examples
regexircrfc

Match IRC channel with regular expression (RFCs 2811-2813)


I know this question has been answered in accordance with RFC 1459. But how would one go about using regular expressions to match channels in accordance with RFCs 2811-2813?

RFC 2811 states:

Channels names are strings (beginning with a '&', '#', '+' or '!' character) of length up to fifty (50) characters. Channel names are case insensitive.

Apart from the the requirement that the first character being either '&', '#', '+' or '!' (hereafter called "channel prefix"). The only restriction on a channel name is that it SHALL NOT contain any spaces (' '), a control G (^G or ASCII 7), a comma (',' which is used as a list item separator by the protocol). Also, a colon (':') is used as a delimiter for the channel mask. The exact syntax of a channel name is defined in "IRC Server Protocol" [IRC-SERVER].

And supplementing that, RFC 2812 states:

channel    =  ( "#" / "+" / ( "!" channelid ) / "&" ) chanstring
              [ ":" chanstring ]
chanstring =  %x01-07 / %x08-09 / %x0B-0C / %x0E-1F / %x21-2B
chanstring =/ %x2D-39 / %x3B-FF
                ; any octet except NUL, BELL, CR, LF, " ", "," and ":"
channelid  = 5( %x41-5A / digit )   ; 5( A-Z / 0-9 )

Solution

  • To show you how to create a composite regex, I'll make a simplified example.

    Suppose a channel name can be up to 20 characters, with lowercase letters only. A regex matching this might be:

    [#&][a-z]{1,20}
    

    That is, a # or &, followed by 1 to 20 letters. Since the channelid doesn't follow the same pattern, a regex for that might be:

    ![A-Z0-9]{5}
    

    which is a ! followed by exactly 5 uppercase letters or digits. For a complete regex that matches either of these, you combine them with (...|...), like this:

    ([#&][a-z]{1,20}|![A-Z0-9]{5})
    

    You can then drop in your slightly more complex regex for the exact channel name pattern you want to match.