I came across one situation where I wanted to use non-greedy atom .*?
in the regex pattern.
set input "Device ID: HOST1
Interface: GigabitEthernet0/1, Port ID (outgoing port): GigabitEthernet2/43
Device ID: HOST2
Entry address(es):
Interface: GigabitEthernet0/2, Port ID (outgoing port): GigabitEthernet2/43
"
puts "======== Non-Greedy regex starting with some other patterns ========"
puts [ regexp -inline {Device\s+ID:.*?outgoing\s+port\):\s+} $input]
puts "======== Non-Greedy regex at first ========"
puts [ regexp -inline {.*?outgoing\s+port\):\s+} $input]
Output :
======== Non-Greedy regex starting with some other patterns ========
{Device ID: HOST1
Interface: GigabitEthernet0/1, Port ID (outgoing port): GigabitEthernet2/43
Device ID: HOST2
Entry address(es):
Interface: GigabitEthernet0/2, Port ID (outgoing port): }
======== Non-Greedy regex at first ========
{Device ID: HOST1
Interface: GigabitEthernet0/1, Port ID (outgoing port): }
While .*?outgoing\s+port\):\s+
is matching till the first occurrence, the pattern Device\s+ID:.*?outgoing\s+port\):\s+
is not stopping at the first occurrence of the match.
Why the behavior of non-greedy match is getting affected due to placement of the atoms?
It's not that well documented (IMO) but the re_syntax man page says this about greedy/non-greedy preference:
A branch has the same preference as the first quantified atom in it which has a preference.
(emphasis mine)
So if you have .*
as the first quantifier, the whole RE will be greedy,
and if you have .*?
as the first quantifier, the whole RE will be non-greedy.