Search code examples
regexlinuxzcatzgrep

How can I specify a regex in a zgrep/zcat command?


I want to find in a list of words, every words with a least 3 times the same letter in it. To achieve that I did .*(\w).*\1.*\1.*\1.* and you can test it here http://www.regexplanet.com/advanced/java/index.html but I don't know how to put it in my zgrep command.

How can I insert this regex in a zgrep command ?


Solution

  • A couple of notes:

    • You do not need to match start and end of a line with .* since partial matches are allowed
    • \w matches letters, digits and underscores in NFA regex patterns, in POSIX, it is safer to use [[:alnum:]_]
    • To form a capturing group in a POSIX BRE pattern use escaped parentheses, \(...\).

    Thus, use

    zgrep '\([[:alnum:]_]\).*\1.*\1.*\1' a.gz
    

    Or, contract it a bit since it looks a bit redundant with three consecutive .*\1 subpatterns:

    zgrep '\([[:alnum:]_]\)\(.*\1\)\{3\}' a.gz