Search code examples
regexgrepposixpcre

Why does grep match lazily when invoked with -zoP and matching backreference followed by newline?


I have a file cases:

foo
bar
  cases:
    1: foo
    2: bar
baz
  cases:
    3: baz
quux

As the indentation always goes back again after the cases I want to list the cases with grep -zoP '(\s*)cases:\n(\1.*\n)*' cases, but that outputs

  cases:

  cases:

Whereas if I use grep -zoP '(\s*)cases:\n(\1.*\n){1,}' cases, I get the output I want:

  cases:
    1: foo
    2: bar
  cases:
    3: baz

This behavior doesn't appear with any similar regexp I've tried:

$ grep -o '\(foo\)bar\(\1\)*'<<<$'foobarfoofoofoofoo'
foobarfoofoofoofoo
$ grep -o '\(foo\)bar\(\1\)*'<<<$'foobarfoofoofoofoobax'
foobarfoofoofoofoo
$ grep -oP '(foo)bar(\1)*'<<<$'foobarfoofoofoofoobax'
foobarfoofoofoofoo
$ grep -zoP '(foo)bar(\1)*'<<<$'foobarfoofoofoofoobax'
foobarfoofoofoofoo
$ grep -zoP '(foo)\n*bar'<<<$'foo\n\n\n\n\n'
foo







$

Why does grep prefer to match my regexp 0 times?


Solution

  • I thought it was a bug, but I was pointed out that \s, which is a synonym for the POSIX character class [:space:] corresponds to [ \t\n\r\f\v] in the C locale and therefore also matches the preceding newline here.