Search code examples
gitgitignorefnmatch

What is the difference between a single /* and double /** trailing asterisk in a .gitignore file?


Consider the following two patterns in a .gitignore file

foo/*
foo/**

The pattern format specification states:

An asterisk * matches anything except a slash. [...]

A trailing /** matches everything inside. For example, abc/** matches all files inside directory abc, relative to the location of the .gitignore file, with infinite depth.

That sounds like the same thing to me when used at the end of a pattern directly after a slash. I did test a few cases - with, and without subdirectories below foo and various negated patterns - and did not observe any difference.

Is there any scenario when one would chose /** over /*?


At first, I expected to see a use case with a pattern like the one below, but there wasn't because both patterns will ignore everything inside and as the specification also denotes "[...] it is not possible to re-include a file if a parent directory of that file is excluded [...]"

foo/*
!foo/a/b/c/file.txt

foo/**
!foo/a/b/c/file.txt

Solution

  • The technical difference is clear enough. If you are using some fnmatch function that handles **,1 and pass in as your pattern-and-string pair:

    fnmatch(pattern="foo/**", string="foo/bar/baz")
    

    it will match. Using the pattern foo/*, however, it won't match.

    Because of the way .gitignores get handled, however, there's no meaning here for purely positive patterns. That is due to the sentence you noted in italics. Git reads an exclusion file (.gitignore, .git/info/exclude, and your global excludes file) before or during a depth-first search through a working tree. This depth-first search uses code of this general form. I've used Python as the syntax here, but not really tried to make it all work (nor made any attempt at efficiency, vs Git, which is, internally speaking, soggy with efficiency).

    # call the given function fn on each file in the directory
    # (note that we have already committed to reading the directory).
    def search(dir, excludes, fn):
        try:
            with open(os.path.join(dir, ".gitignore")) as stream:
                excludes = excludes.more(dir, stream)
        except FileNotFoundError:
            pass # ignore the lack of a .gitignore
        all_files = os.listdir(dir)
        for name in all_files:
            full_path = os.path.join(dir, name)
            is_dir = os.path.isdir(full_path)
            if excludes.is_excluded(name, path, is_dir):
                continue # don't add this file or search this directory
            if is_dir:
                search(full_path, excludes, fn)
            else:
                fn(full_path)
    

    (We'll kick this whole thing off by cd-ing to the top of the working tree and using search(".", repo.top_excluder, add_file) or something like that. The top_excluder field here carries our global and per-repo patterns. Note that excludes.more() has to use a data structure that automatically clears subdirectory exclusions when the recursive search call returns, and needs to handle excluder-file priority, since a deeper .gitignore overrides an outer-layer .gitignore.)

    The way this treats an excluded directory is that it never bothers to look inside it at all. That's the source of the fact that, given positive exclusions only (no !foo/** kind of thing), there's no need for ** here: if we've determined that some directory will be excluded, it's already excluded along with everything in it.

    But we don't just have positive patterns: we have negative patterns too. Consider, e.g., this very simple .gitignore file:

    # ignore things named skip unless they're directories
    *skip
    !*skip/
    

    The negation, !*skip/, overrides the *skip, but only when the file named fooskip or barskip or whatever actually is a directory. So we do look inside fooskip/, and when we are in there, we skip another file named quuxskip but not a subdirectory named plughskip.

    This means that a simple method of defeating Git's optimization is:

    !*/
    

    Such a line, placed in the appropriate point of a .gitignore file (near or at the end), causes all directories to be searched, even if they would otherwise be ignored by an ignore rule. That is, our excludes.is_excluded() call will receive the local file name—whatever it is—and a True flag for the is-a-directory test, so that */ will match it; the prefix ! will mean that this directory is not ignored and therefore we will search it recursively.

    This line completely discards the optimization Git is trying to make here so it is relatively expensive if you have directories that should be ignored. But it is a very quick and dirty way to make .gitignore behave nicely, if you don't want to use the more verbose method. That is, instead of:

    foo/*
    !foo/one/
    foo/one/*
    !foo/one/is/
    foo/one/is/*
    !foo/one/is/important/
    foo/one/is/important/*
    !foo/one/is/important/this-file
    

    you can simply write:

    foo/**
    !foo/one/is/important/this-file
    !foo/**/
    

    This will force Git to search, laboriously, through the entire foo directory and all of its subdirectories just so that the foo/one/is/important/this-file file can be matched by the second rule. Here we need the double * because these are prefixed by foo/; if we put this .gitignore file into foo/.gitignore we could use the simpler single * form:

    *
    !one/is/important/this-file
    !*/
    

    In any case this is the general principle, and is a reason that ** can be useful.

    (Note that you could also just force-add the one important file to Git's index before making the first commit that will hold it, or add it before creating the .gitignore rules that will ignore it. I dislike this particular trick myself though as it means you have a file carried around in Git's index that, if it's ever accidentally removed from Git's index, won't get re-added.)


    1Note that both POSIX and Python fnmatch do not handle these in the first place. In Python, you would want glob.glob. Git, of course, does not expose these as function calls in the first place.