Search code examples
pythonlinuxperformancetext-files

Is it possible to speed-up python IO?


Consider this python program:

import sys

lc = 0
for line in open(sys.argv[1]):
    lc = lc + 1

print lc, sys.argv[1]

Running it on my 6GB text file, it completes in ~ 2minutes.

Question: is it possible to go faster?

Note that the same time is required by:

wc -l myfile.txt

so, I suspect the anwer to my quesion is just a plain "no".

Note also that my real program is doing something more interesting than just counting the lines, so please give a generic answer, not line-counting-tricks (like keeping a line count metadata in the file)

PS: I tagged "linux" this question, because I'm interested only in linux-specific answers. Feel free to give OS-agnostic, or even other-OS answers, if you have them.

See also the follow-up question


Solution

  • You can't get any faster than the maximum disk read speed.

    In order to reach the maximum disk speed you can use the following two tips:

    1. Read the file in with a big buffer. This can either be coded "manually" or simply by using io.BufferedReader ( available in python2.6+ ).
    2. Do the newline counting in another thread, in parallel.