Search code examples
gitwildcardgitignore

How to properly track only one specific subdirectory in directory in Git?


I understand you can use the NOT to allow an exception. But based on this,

It is not possible to re-include a file if a parent directory of that file is excluded.

Is there any way around this? For my case specifically, I have a folder called model_results, with many subfolders. I want to ignore every single subfolder in model_results, EXCEPT subfolders that are appended with _final, one specific subfolder in those, called model

This is what I've tried to no avail:

# ignore subdirectories within model_results
model_results/**
# un-ignore these
!model_results/*_final/model

This still ignored regtree_final/model

If it is possible, I suspect the reason for mine failing is the way I'm using wildcards.


Solution

  • Recursion: see recursion

    It's important to understand, here, that Git uses a recursive search to find files. What does this mean? Well, the joke version is the section title here, but in fact, we start with an initial directory (or folder, if you prefer that term) to search:

    func search(prefix string, d Directory) {
        for element in (all files and subdirectories in d) {
            skip = false
            name = element.name
            full_name = prefix + name
            type = element.type  # file, directory, or "other"
            switch type {
            case type_file:
                if not_in_index(full_name) and is_ignored(name, full_name, type_file)
                     skip = true
            case type_directory:
                 if is_ignored(name, full_name, type_directory)
                     skip = true
            case type_other:
                 skip = true
            }
            if skip {
                // don't even look at this any more
                continue
            }
            if type == type_file {
                git_add(full_name)
            } else {
                subdir = opendir(full_name)
                search(full_name + "/", subdir)
                closedir(subdir)
            }
        }
    }
    

    —except that the actual code is tremendously more complicated for various reasons. But the key is this: if we hit an ignored directory, we never look inside that directory! So if we ignore model_results/regtree_final (the directory), we never see any of the files inside model_results/regtree_final and therefore never add any of them. We never test to see whether they're ignored, or un-ignored, or whatever. We just never bother with the entire directory.

    To make sure that we do look inside model_results/regtree_final, we must arrange for is_ignored(name, full_name, type_directory) to say "no, this is not ignored". So how do we do that?

    Well, we can explicitly un-ignore the name regtree_final, or the full name model_results/regtree_final. That would require a line of the form:

    !regtree_final
    

    or:

    !model_results/regtree_final
    

    as a separate line in the .gitignore file that occurs after the model_results/** line.

    Side notes

    Side note: if we do ignore, say, model_results/blah we don't need to carefully also ignore model_results/blah/zonk because we'll never look at model_results/blah in the first place and hence never test model_results/blah/zonk. So as written, the ** is overkill. It's not wrong, it's just unnecessarily inclusive. Whether it will become right or necessary later is another question that you'll have to ask yourself later.

    Secondary side note: I prefer to use the simple names in a .gitignore, rather than names containing embedded or leading slashes. That is, instead of a top level .gitignore, I'd rather have a file named model_results/.gitignore in which I list the things to ignore that live within model_results. This is a personal preference item and you may or may not be using some third-party software that prohibits this anyway, so it's up to you whether to adopt a similar preference. Just remember that there's an important difference between anchored and un-anchored entries in .gitignore files and when you use a top-level .gitignore to ignore entries in lower-level directories, all your entries for that lower level directory are by definition anchored: you have no choice here. When using a .gitignore at the same level as the files, you have a choice. You might still want anchored entries, e.g., /* and !/*_final/, but you have the option of going either way, whereas with the top-level final, your entries are all anchored.

    Un-ignoring some or all sub-directories

    Now, I mentioned that to explicitly un-ignore regtree_final you could use:

    !model_results/regtree_final
    

    Note that you can also write:

    !model_results/regtree_final/
    

    The trailing slash here means "apply this rule if and only if this name is a directory name". That's why the is_ignored calls in the sample pseudo-code pass the entry's type. Rules like this, that end with slash, mean "match only if the type is directory".

    The problem with un-ignoring regtree_final like this is that it's necessary but not sufficient, as you want all *_final directories un-ignored. You can achieve this with:

    !model_results/*_final/
    

    Here we've used the trailing slash to mean directories only, no files please and the leading ! to means un-ignore, i.e., do look inside this directory.

    Now, if model_results/regtree_final/ contains another directory, e.g., if you have:

    model_results/regtree_final/one/file1.ext
    model_results/regtree_final/two/file2.ext
    

    there's a problem with your model_results/** line above, because *that line ignores model_results/regtree_final/one.

    If you have no sub-sub-directories, this isn't really a problem, but when you do get into deeply nested, "bushy" directory structures, it gets a little tricky.

    One handy trick for Git ignore files is !*/. This is an un-anchored expression, so it applies to all names found anywhere. But it's a trailing-slash expression, so it applies only to directory names ... and it's an "un-ignore" rule because it starts with !. So it completely defeats the directory optimization that Git uses.

    That is, when we hit the:

        case type_directory:
             if is_ignored(name, full_name, type_directory)
    

    section of the code, we're always going to say "no, don't skip". Git will always look inside every directory, and slowly and painfully test every file in every directory. This is slow! It's a Big Hammer. Use with caution.