Does zgrep limit the amount of search patterns?

I have the two following commands.

# 6 million lines
"zgrep -oF -f /projects/lab.mis/index/temp_hg19.inx /projects/incoming/M1_R2.fastq.gz"
# 6 thousand lines 
"zgrep -oF -f /projects/lab.mis/index/optimize_hg19.inx /projects/incoming/M1_R2.fastq.gz"

The first searches through a file that has over 6 million patters while the second one only 6K.

Both files should contain the fixe string: "GATTCCAGATGGAGGT"

However only the second command, the one with 6K search terms, returns the match. Is there a reason for this? I do not see any error message per se so very confused.

Solution

[Part of] the string you expect to find may be part of [the concatenation of] other matching strings from the bigger file.

For example:

$ cat file
FOOGATTCCAGATGGAGGTBAR

with a small file of just the 1 string to match:

$ cat strings1
GATTCCAGATGGAGGT

$ grep -Fof strings1 file
GATTCCAGATGGAGGT

and with a larger file that includes that string plus others:

$ cat strings2
GATTCCAGATGGAGGT
FOOG

$ grep -Fof strings2 file
FOOG

since grep reported a match on FOOG, the G that'd be the start of GATTCCAGATGGAGGT has been consumed by the match on FOOG and so is no longer in the buffer for grep to match, all that's left is ATTCCAGATGGAGGTBAR.

If you want to find all matches you can do:

$ awk 'NR==FNR{strs[$0]; next} {for (str in strs) if ( pos=index($0,str) ) print substr($0,pos,length(str))}' strings2 file
GATTCCAGATGGAGGT
FOOG

(really zcat file | awk '...' for your gzipped file of course) but that'd obviously be slower than zgrep as it's doing more work than zgrep.