I want to grep a vcf file for search for multiple positions. The following works:
grep -f template_gb37 file.vcf>gb37_result
My template_gb37 has 10000 lines and it looks like this:
1 1156131 rs2887286 C T
1 1211292 rs6685064 T C
1 2283896 rs2840528 A G
When the vcf has the rs it works perfect.
The problem is that the vcf I am going to grep may not have the rs and "." instead:
File.vcf
#CHROM POS ID REF ALT ....
1 1156131 . C T ....
1 1211292 . T C ....
1 1211292 . T C ....
Is there a way to search my multiple patterns with "rs" or just "."?
Thanks in advance
I think you mean the second field in your file could be .
or rsNNNNNN
and you want to allow either. So, I think you need an "alternation" which you do with a |
like this:
printf "cat\nmonkey\ndog" | grep -E "cat|dog"
cat
dog
So your pattern file "template_gb37"
needs to look like this:
1 1156131 (\.)|rs2887286 C T
1 1211292 (\.)|rs6685064 T C
1 2283896 (\.)|rs2840528 A G
And you need to search with:
grep -Ef PATTERNFILE file.vcf
If you don't want to change your pattern file, you can edit it "on-the-fly" each time you use it. So, if "template"
currently looks like this:
1 1156131 rs2887286 C T
1 1211292 rs6685064 T C
1 2283896 rs2840528 A G
the following awk
will edit it:
awk '{$3 = "(\\.)|" $3}1' template
to make it this:
1 1156131 (\.)|rs2887286 C T
1 1211292 (\.)|rs6685064 T C
1 2283896 (\.)|rs2840528 A G
which means you could use my whole answer like this:
grep -Ef <( awk '{$3 = "(\\.)|" $3}1' template ) file.vcf