My python function is given a (long) list of path arguments, each of which can possibly be a glob. I make a pass over this list using glob.glob
to extract all the matching filenames, like this:
files = [filename for pattern in patterns for filename in glob.glob(pattern)]
That works, but the filesystem I'm on has very poor performance for directory listing operations, and currently this operation adds about a minute(!) to the start-up time of my program. So I would like to only perform glob expansion for non-trivial glob patterns (i.e. those that aren't just normal pathnames) to speed this up. I.e.
def cheapglob(pattern):
return [pattern] if istrivial(pattern) else glob.glob(pattern)
files = [filename for pattern in patterns for filename in cheapglob(pattern)]
Since glob.glob
basically does a set of directory listings coupled with fnmatch.fnmatch
, I thought it should be possible to somehow ask fnmatch
whether a given string is a non-trivial pattern or not, but I can't see how to do that.
As a fallback, I guess I could attempt to identify these patterns in the string myself, though that feels a lot like reinventing the wheel, and would be error prone. But this feels like the sort of thing there should be an elegant solution for.
According to the fnmatch source code, the only special characters it recognizes are *
, ?
, [
and ]
. Hence any pattern that does not contain any of these will only match itself. We can therefore implement the cheapglob
mentioned in the question as
def cheapglob(s): return glob.glob(s) if re.search("[][*?]", s) else [s]
This will only hit the file system for patterns which include special characters. This differs subtly from a plain glob.glob
: For a pattern with no special characters like "foo.txt", this function will return ["foo.txt"]
regardless of whether that file exists, while glob.glob
will return []
if the file isn't there. So the calling function will need to handle the possibility that some of the returned files might not exist.