Search code examples
pythonglob

When is iglob evaluated?


The documentation says:

Return an iterator which yields the same values as glob() without actually storing them all simultaneously.

Actually I don't think it is true, in particular in an environment when files are created during the execution of the program. Example:

import glob
g = glob.iglob("*.py")
!touch a.py
!touch z.py
next(g)
next(g)

The problem is that (I think) that iglob is evaluated when you call next. Is there a way to avoid it?

My problem is the following: my Python script is searching for files in a folder, recursively (recursive=True') with glob. The point is that I have tons of files, and there is no need to store all the filenames at the same time since I am processing one by one, so I would like to use generators. The point is that what my program does is to move the files in subdirectories, so I am worried that it can be that iglob will find the same file two times since I have moved it.


Solution

  • You can't force iglob() to read the directory listing early, no. That's how generators work; they won't start any work until you actually iterate. As such, glob() and iglob() only return the same results provided the filesystem doesn't change before iteration has completed.

    Mostly, iglob() filters lazily; for the simple no-path, non-recursive case, os.listdir() is called as soon as you start iterating, and iteration only involves filtering those names.

    For recursive patterns (using **) or where there are patterns used in the path name, iglob() postpones loading the directory listing for subdirectories; the top-most directory without a pattern still is loaded immediately, so if subdirectories changed while iterating, you'll get inconsistent results too.

    Don't use iglob() if you need to capture the state of the filesystem at a specific time. Use glob() in that case, and 'freeze' that state in a Python list.