Search code examples
regexcoding-styledelimiterbnf

How do you call the inner part of a regex? (the one delimited by the delimiters)


How do you call the "inner part" of a regular expression without the delimiters?

For example:

Given these regular expressions: /\d+/ and #(hello)# we can break each one down into 3 parts:

  • / + \d+ + /
  • # + (hello) + #

We all name / or # the delimiter.

How do you call the inner part? The \d+ or (hello) part?

In this BNF https://www2.cs.sfu.ca/~cameron/Teaching/384/99-3/regexp-plg.html referenced here https://stackoverflow.com/a/265466/1315009 it seems they call "regular expression" to the inner part. If that is true, then how do you call the regular expression with the delimiters concatenated?

The reason for asking this is Clean Code rules. I'm writing a tokenizer and I need to clearly name the "full thing" and the "inner thing" with proper names.


Solution

  • The regex delimiters delimit the following parts:

    <action>/<pattern>(/<substituiton>)/<modifiers>
    

    Action

    This part of the regex delimiter construction contains implicit (no char) or explicit (expressed with a char) information about what the regex will be doing: matching, replacing, and sometimes even if it is going to work on the entire file as in Vim. Actions are also called commands (or operators) in the POSIX tools context. The usual action chars are s and m that stand for substitution and match.

    Pattern The second part, you called it inner part - is called a pattern (see perlop reference). When describing the $var =~ m/mushroom/ expression, this reference explains:

    The portion enclosed in '/' characters denotes the characteristic we are looking for. We use the term pattern for it.

    So, when we say "regex" or "regexp" we basically refer to the regular expression pattern.

    Substituiton

    This part only exists in substitutions constructions, prefixed with s action/command. Substitution patterns syntax is very different from regex pattern syntax, as they can usually contain named or numbered backreferences, escape sequences to cancel the backreference syntax (cf. "dollar escaping"), and sometimes case changing operators (like \l, \L...\E, \u and \U...\E).

    Modifiers

    Also called flags, these parts help "fine-tune" the process of matching patterns by regex engines. Most common modifiers are the i case insensitive flag, g global matching flag, s singleline/dotall modifier that makes . match across line breaks (in NFA regexps other than Onigmo/Oniguruma, it uses m).