Considering a text file of 1.5 million lines and around 50-100 words per line.
To find the lines that contains the word, using os.popen('grep -w word infile')
seems to be faster than
for line in infile:
if word in line:
print line
How else could one search a word in a textfile in python? What is the fastest way to search through that large unindex textfile?
There are several fast search algorithms (see wikipedia). They require you to compile the word into some structure. Grep is using Aho-Corasick algorithm.
I haven't seen the source code for python's in
but either
word
is compiled for each line which takes time (I doubt in
compiles anything, obviously it can compile it, cache the results, etc), or