Search code examples
regexlinuxbashgrepcurly-braces

Curly Braces with {} grep and regular expressions: Why does it exceed the maximum value?


I've been self-studying shell scripting for a while now, and I came across this section of a Linux Fundamentals manual concerning grep and curly braces {}. My problem is that when I'm demanding a string pattern to search for using grep from a minimum to a maximum number of occurrences using {} or curly braces, my result exceeds the maximum that I specified.

Here is what happened:

Express11:~/unix_training/reg_ex # cat reg_file2
ll
lol
lool
loool
loooose
Express11:~/unix_training/reg_ex # grep -E 'o{2,3}' reg_file2
lool
loool
loooose
Express11:~/unix_training/reg_ex #

When according to the manual, should not be the case as I am specifying here that I am only looking for strings containing two consecutive o's to three consecutive o's.

EDIT: Actually, the reason why I did not understand how the curly braces worked was because of this simplistic explanation by the manual. And I quote:

19.4.10. between n and m times And here we demand exactly from minimum 2 to maximum 3 times.

paul@debian7:~$ cat list2
ll
lol
lool
loool
paul@debian7:~$ grep -E 'o{2,3}' list2
lool
loool
paul@debian7:~$ grep 'o\{2,3\}' list2
lool
loool
paul@debian7:~$ cat list2 | sed 's/o\{2,3\}/A/'
ll
lol
lAl
lAl
paul@debian7:~$

Thanks to all those who replied.


Solution

  • # grep -E 'o{2,3}' reg_file2
    lool
    loool
    loooose
    

    Command works perfectly, that it matches the first three o's in the last line. That's why you get also last line in the final output.

    I think the command you're actually looking for is,

    $ grep -P '(?<!o)o{2,3}(?!o)' file
    lool
    loool
    

    Explanation:

    • (?<!o) negative lookbehind which asserts that the match won't be preceded by the letter o.

    • o{2,3} Matches 2 or 3 o's.

    • (?!o) Negative lookahead which asserts that the match won't be followed by the letter o.

    OR

    $ grep -E '(^|[^o])o{2,3}($|[^o])' file
    lool
    loool
    

    Explanation:

    • (^|[^o]) Matches the start of a line ^ or any character but not of o

    • o{2,3} Matches 2 or 3 o's

    • ($|[^o]) Matches the end of the line $ or any character but not of o