Search code examples
javascriptregexquantifiers

What's the difference between this two regular expressions? (Understanding ? Quantifier)


On book Eloquent JavaScript chapter 9: Regular Expressions under Section "Parsing an INI File" there's an example which includes a regular expression I don't catch at all. The author is trying to parse next content:

searchengine=http://www.google.com/search?q=$1
spitefulness=9.7

; comments are preceded by a semicolon...
; each section concerns an individual enemy
[larry]
fullname=Larry Doe
type=kindergarten bully
website=http://www.geocities.com/CapeCanaveral/11451

[gargamel]
fullname=Gargamel
type=evil sorcerer
outputdir=/home/marijn/enemies/gargamel

On the rules for this format state that

Blank lines and lines starting with semicolons are ignored.

The code which parses this content goes over every line in the file. In order to process comments, he include this expression

^\s*(;.*)?

As far as I understand, this expression process lines which may start with a sequence of

white space characters, including space, tab, form feed, line feed and other Unicode spaces

(source) until it appears a semi-colon ; and then a sequence of "any single character except line terminators: \n, \r, \u2028 or \u2029.". All this restricted to {0,1} appearances.

I don't get the point of quantifier ? here. I'm not able to find (regex101) any case where not limiting appearances of matching string can be a problem. Why that expression is different to this other one:

^\s*(;.*)

Thanks in advance.


Solution

  • The ^\s*(;.*) requires a ;, it cannot match a blank line.

    The ^\s*(;.*)? can match an blank line, it does not require ;.

    The common part is ^\s* - start of line (or string) and then zero or more whitespaces.

    Then 1) (;.*) matches a ; (1 instance obligatorily) and then zero or more characters other than newline, and 2) (;.*)? matches an optional sequence (the (...)? is an optional group since ? is a quantifier matching one or zero occurrences of the quantified atom, while the atom can be a symbol, a character class, a group) of a ; followed with 0+ characters other than a newline.

    Also, note that \s matches an LF and CR symbols and that means that (if the MULTILINE modifier is ON and the input is a text containing multiple lines) the regex ^\s* may match across several lines until the first non-whitespace character.