Search code examples
regexregex-lookaroundsag

how to match all files containing word1 AND word2 across different lines with ag or rg (PCRE/Rust regex)


I've long list of generated reports which I want to filter. The report is something like this:

Report Name
Report Date
Blah blah blah
Blah: WORD1
Blah blah
blah blah: WORD2
blah blah

I'm trying to use ag (PCRE regex) or rg (rust regex) and find all files which contains WORD1 AND WORD2 in different places of the file (contains new line).

I've already search SX and found these which didn't work:

> ag (?=.*WORD1)(?=.*WORD2)

> ag (?=.*WORD1)((.|\n)*)(?=.*WORD2)

UPDATE

As @WiktorStribiżew pointed out, the ag uses PCRE. Sorry for the mistake.

my expected output is:

blah blah: WORD2

or just the list of matched files.


p.s. currently I've managed to using this:

> ag "WORD2" $(ag -l "WORD1")

Solution

  • You may use a PCRE pattern with ag:

    (?s)^(?=.*WORD1)(?=.*WORD2).*\n\K(?-s).*WORD2
    

    See the regex demo.

    Details:

    • (?s) - a DOTALL modifier ON (. matches line break chars)
    • ^ - start of string
    • (?=.*WORD1) - there must be WORD1 somewhere in the string
    • (?=.*WORD2) - there must be WORD2 somewhere in the string
    • .* - any 0+ chars, as many as possible, up to the last occurrence of the subsequent subpatterns (if you use a lazy *? quantifier, .*? will match 0+ chars as few as possible up to the first occurrence of the subsequent subpatterns)
    • \n - a newline
    • \K - match reset operator discarding the currently matched text
    • (?-s) - DOTALL mode disabled (. does not match line breaks)
    • .*WORD2 - any 0+ chars other than line break chars, as many as possible, and then WORD2.