Search code examples

RFC regular expression operators

I recently read an RFC document and I noticed that regex operators that have been used don't match the commonly known. For example:

date-time = [ day-of-week "," ] date time [CFWS]
year = (FWS 4*DIGIT FWS) / obs-year

The square bracket means that it will match only one out of several characters in it. But in the RFC I see that they interpret it as "optionally". The same with the asterix, that says the preceding token will occur zero times or more. In the example we have


which is not difficult to guess that means 4 occurences of DIGIT token.

How should I interpret the RFC document regex operators, is there any document describing their designation?


  • The document (I believe) you're looking at, RFC 2822, says this:

    1.2.2. Syntactic notation

    This standard uses the Augmented Backus-Naur Form (ABNF) notation specified in [RFC2234] for the formal definitions of the syntax of messages.

    So, yes, the syntax is defined in RFC 2234, and is not Regular Expressions.

    A few sections specific to the block you've quoted:

    3.5 Sequence Group

    Elements enclosed in parentheses are treated as a single element, whose contents are STRICTLY ORDERED.

    3.6 Variable Repetition

    The operator "*" preceding an element indicates repetition. The full form is:


    where <a> and <b> are optional decimal values, indicating at least <a> and at most <b> occurrences of element.

    3.8 Optional Sequence

    Square brackets enclose an optional element sequence: