Given a set of patterns A = {a_1, a_2, ..., a_n}
, I want to print those paragraphs that contain all those patterns: a_1
, a_2
, ..., a_n
.
Let's suppose I have the following file
$ cat main.txt
a
b
c
a
b
b
c
c
a
a
b
c
I want to print all those paragraphs that contain the following patterns: \<a\>
, \<b\>
, \<c\>
. That is, the output should be
$ {some command here}
a
b
c
I've written the following command. However, this consider those lines containing only spaces or tabs as part of a paragraph (recall that lines containing only whitespaces must not be considered part of a paragraph). I think this could be improved by executing awk
once.
$ awk -v RS= '/\<a\>/ {print $0,"\n"}' main.txt |\
awk -v RS= '/\<b\>/ {print $0,"\n"}' |\
awk -v RS= '/\<c\>/ {print $0}'
a
b
c
Are there more effective ways of accomplishing this?
You have to prepare your input for the empty RS
:
awk '!NF{$0=""}1' main.txt > input.txt
This way, no blank (non-empty) lines will be considered part of the paragraph and you remove the possibility these blanks to be part of one of your patterns. Actually it's hard to be part of the pattern (but not impossible), but it is very possible to unify paragraphs, so this input "a\n \nb\n\c"
would be considered one paragraph that matches all patterns.
Of course, you have to run awk
once to test all patterns together for each paragraph. But even once at a time like you do it now, it works, if you prepare the input.
awk -v RS= '/\<a\>/ && /\<b\>/ && /\<c\>/{print $0,"\n"}' input.txt