Search code examples
bashunixgrepxargs

Using grep output as pattern for a second grep


I want to use the output of a grep command as the pattern argument in a second grep.

grep "pattern1" file1 | grep [output of previous grep] file2

Desired behaviour is finding lines with pattern in file1, then finding lines in file2 which also have the pattern. (The reason I'm not searching for the pattern in file2 directly is that I'm doing additional stuff like sed between the two greps.)

I think this should be possible with xargs, but I've only been able to find examples for using the output of the first grep in place of file2, not in place of the pattern argument.

I've noticed while making this thread that there is a similar question from five years ago with solutions using awk. I will probably use those solutions if necessary, but I'm curious to know if this is possible with grep and xargs.

EDIT Here are some example files. out_prot.fq

>p1.A2|PDKKMNCP_00148 
MDAFELPDTLAQALQRRAAK
>p1.A2|PDKKMNCP_00161 
MNPEHAQKLARRFVELPLE
>p1.A2|PDKKMNCP_00162 
MTGTTAARIAKRFVGLSLEQRRQFLSR

p1.A2.tsv

ProtName p1.A2|PDKKMNCP_00163 69.479 557 169 1 103 659 1087 1642 0.0 803 83
ProtName p1.A2|PDKKMNCP_00161 50.707 566 256 10 114 659 51 613 3.31e-170 523 81
ProtName p1.A2|PDKKMNCP_00148 48.522 575 283 2 104 672 1726 2293 1.78e-166 536 85
ProtName p1.A2|PDKKMNCP_00148 46.824 551 281 5 116 659 682 1227 1.76e-142 467 85

I have now tried grep $(grep ">" out_prot_test.fq | sed 's/>//') p1.A2.tsv > test as suggested by @Dominique and @david-grayson .

This is the output I get:

test

p1.A2.tsv:ProtName p1.A2|PDKKMNCP_00148 48.522 575 283 2 104 672 1726 2293 1.78e-166 536 85
p1.A2.tsv:ProtName p1.A2|PDKKMNCP_00148 46.824 551 281 5 116 659 682 1227 1.76e-142 467 85

This is almost what I want, except for the second file name being appended to the file contents (the p1.A2.tsv: at the beginning of each line). I could trim it out with sed again but there might be situations where that's not possible. Is there a way to prevent it from appearing at all?

What I want:

ProtName p1.A2|PDKKMNCP_00148 48.522 575 283 2 104 672 1726 2293 1.78e-166 536 85
ProtName p1.A2|PDKKMNCP_00148 46.824 551 281 5 116 659 682 1227 1.76e-142 467 85

Solution

  • Looks fairly easy to me:

    grep $(grep "entry1" file1) file2
    

    $(...) is the standard substitution. In some cases, you might need to replace it by backticks (accents graves), but those are very difficult to show here on the site.

    I'm not sure what you mean in your comment: I have two files, file1 ("entry") and file2 ("entry2"). When I launch grep $(grep "entry" file1) file2, I only see "entry2".
    But when I launch grep $(grep "entry" file1) file*, then I see the filenames too ("file1:entry" and "file2:entry2"), is this what you are referring to?