Search code examples
gitgitignore

How to use .gitignore to ignore everything in a directory except one file?


I've found several purported solutions to this problem here on SO, but for some unknown reason none of them work for me.

I need to ignore everything in a given folder except for one particular file. Easy, right? Not so fast.

I've tried most every suggested answer for each of these questions:

...but I'm no further along than when I started.

Here's the path to the file to include:

D:\Projects\Website\Website\bin\Settings.json

The repo is at:

D:\Projects\Website

My .gitignore file was generated by Visual Studio, so it contains this entry:

[Bb]in/

According to many of the answers to the questions above, I should be able to do something like this:

!/Website/[Bb]in/Settings.json

...but that doesn't work. The file is still ignored.

None of these permutations do the trick:

!*/Settings.json
!**/Settings.json
![Bb]in/Settings.json
![Bb]in/**/Settings.json
![Ww]ebsite/[Bb]in/Settings.json
!Website/bin/Settings.json
!/Website/bin/Settings.json

I've also tried putting a separate .gitignore file in bin:

# Don't block Settings.json
!Settings.json
!.gitignore

No luck.

How can I block everything in [Bb]in except for the Settings.json file?

  • Expected result: Website\bin\Settings.json is not ignored

  • Actual result: Website\bin\Settings.json continues to be ignored


Solution

  • Adding on to LeGEC's answer, which is fine, I note that you commented:

    That works. It strikes me as a bit brittle (maybe that's just my imagination, and hopefully I'll be proven spectacularly wrong), but if this is the only way, I can live with it.

    It's not the only way, and I have that same itchy feeling about it being brittle or otherwise somehow subtly wrong. It does work and it won't break in normal everyday use, but it just seems wrong to me to have files that are, and stay, tracked solely because they are tracked in the commits you extract, as you go about making new commits.

    The trick here is that the Git path name Website/bin/Settings.json results in a file that lives in a folder once extracted: the file Settings.json is in the folder bin (which in turn is in the folder Website, but that's just adding on to the pile; one "in-the-folder" layer is enough here).

    Note that to Git, Website/bin/Settings.json is just a file name: that file name gets stored like that, with forward slashes, in Git's index (AKA staging area).1 The problem occurs later, when Git is scanning your working tree. The exclusion handling that Git does—using .git/info/exclude and the various .gitignore files—works via working tree files. It has to: it is all about untracked files, and the very definition of untracked file is a file that exists in your working tree, but not in Git's index.

    When Git is comparing the current (HEAD) commit's content—the set of stored files in the current commit, with all of their data—to the files in the index / staging-area, Git does not have to, and does not, look at your working tree at all. Everything Git needs is in the repository: the current commit is determined by reading HEAD, which resolves to a commit hash ID, which resolves to an internal tree object, which obtains for Git all the file names and modes and their hash IDs. The proposed next commit, in the index / staging-area, contains the file names and modes and their hash IDs. The hash IDs let Git know if files are 100% matches or not, and for most purposes that's all we care about: git status just prints an M for modified, or the word modified, without figuring out what actually changed, for instance.

    Reading through the working tree, though: well, that's way harder. The OS gets in the way here. Sure, there may be a C library scandir or readdir function, or some other way to enumerate the contents of a folder. But Git still has to call lstat on each name, perhaps.2 In any case, if you analyze timing results from why git status took more than 20 nanoseconds, you find that it spends a lot of time just reading directories. Wouldn't it be nice if we could find some shortcut for this?

    Enter .gitignore and other exclusion files: if we read the top level work-tree and find directories named tmp and zorg, but those directories are ignored—via * or */ or tmp or tmp/ or whatever—why, then, we don't even have to open and read them at all! It won't matter whether ./tmp contains one file, or one billion files: we'll skip the whole thing! Given that just opening and reading a directory to find its file names can take milliseconds—and using lstat on each name can add many more—this is a huge savings.

    So, Git does this. If Git is preparing a working-tree walk, and it is allowed to skip looking inside some folder / directory, it does skip looking inside that folder. Hence, if your .gitignore file says:

    *
    

    then any directory name will match, and Git will skip opening, much less reading, the directory. This happens to your Website folder.

    If your .gitignore reads:

    *
    !Website
    

    though, when Git reads the top level directory and finds the name Website, it can't ignore that. So Git opens the Website folder and finds bin, among other things. But: bin does match * and does not match Website, so it's ignore-able. That means Git can skip right over it, never looking inside it. You'll need to add Website/bin:

    *
    !Website
    !Website/bin
    

    Now Git has to open Website/bin and read it. Every file and directory within it can be ignored, so to get Settings.json within it to be not-ignored, we need to list that file:

    *
    !Website
    !Website/bin
    !Website/bin/Settings.json
    

    This fairly-minimal .gitignore file will work. It does, however, have one flaw. If there's a file or directory in bin named Website, that file or directory will be not-ignored. If not-ignored, Git will complain about it being untracked, or add it with git add ., or other undesirable behaviors. To fix that, we should make sure that only Website is matched, not, e.g., bin/Website. This gets us to the second tricky part of Git's exclusion rules.


    1The format for index entries is a bit messy and gets compressed, depending on index format version (of which there are several), but git ls-files --stage will dump out the main stuff of interest, and there, you'll see the file named with embedded forward slashes. Git is, of course, capable of handling, and understanding, the backward slashes that Windows uses here, and hence stores the file in the bin folder in the Website directory.

    Strings in Git's index are case-sensitive and are stored as UTF-8 or equivalent, regardless of how the file names are stored in the file system, and regardless of whether the file system's file names are case-insensitive.

    2Some readdir variants include a type field, DT_DIR for instance, that—if you can rely on it—let you skip this step sometimes; that can be a huge time-saver. I don't know if Git tries to do this: the working tree code has been revised multiple times, and now has all the complications from the fsmonitor code, which is a different way to speed things up, so I have not looked lately.


    The other tricky part: anchored vs un-anchored names

    To understand this part properly, I like to borrow a concept from regular expressions: the idea of anchoring something to the left or right. In a regular expression like me*s, we'll match ms pacman and message, but not memory, because we're looking for m, then any number of es, then s, and memory has no s. But we'll also match acmestorage because that has m followed by one e followed by s, embedded within acme and storage (which run together). We can avoid some of this by anchoring the match at the left: ^m*s won't match acmestorage because the m has to be the first letter.

    (REs also let us anchor at the right with $, typically. Each RE syntax has its own peculiarities, and .gitignore files use glob syntax rather than RE syntax, so let's not get too far down this rabbit hole. Just remember the idea of anchoring: sticking a match to the left or right, or both. In Git's case, an anchored path is an exact match, stuck at both sides. That's because the right side is always anchored. You'd have to use path/* or path/** to allow arbitrary right-hand-side parts.)

    In our case, with .gitignore, we'd like to make sure that Website only matches at the top level, where we put the .gitignore file. To do that, we can start the entry with a leading slash:

    *
    !/Website
    !Website/bin
    !Website/bin/Settings.json
    

    Now bin/Website won't match the second line: the second line is anchored at the top (root) directory of the scan, and bin/Website is not at that level: it's one level down.

    You might think we should do that for all three file names:

    *
    !/Website
    !/Website/bin
    !/Website/bin/Settings.json
    

    This works, but it's not necessary, and the reason is that a .gitignore entry is automatically anchored if it has an embedded slash in it. Website/bin has a slash in it that is not at either end, so it's automatically anchored. Website/bin/Settings.json has two such slashes and is also anchored.

    More tricky parts

    I implied there were only two tricky parts here. I lied. 😀 There's one more way that exclusion files uses slashes, which is unfortunately tricky, and that is that a final slash makes an entry match only a directory name. That is:

    bin/
    

    matches the bin directory but not a file named bin.

    This rule is independent of the remaining rules:

    • A leading ! negates the whole thing, so that !/Website/ means don't ignore.
    • A leading / (after any leading !) or any embedded slash that's not at the end means "anchored, so that !/Website/ is anchored.
    • A trailing / means only when it is a directory, so !/Website/ only matches a directory. The trailing slash doesn't count for anchoring purposes (and you should never use a double trailing slash) so if you want anchoring, be sure to include a leading or embedded slash.

    Using all of these rules, we come up with:

    *
    !/Website
    !Website/bin
    !Website/bin/Settings.json
    

    which is complete and correct (provided I have the right upper and lower case here: remember that Git will be case-sensitive, regardless of your file system). But there's one other trick we can use that gives us a slightly shorter file. Suppose we write:

    *
    !*/
    !Website/bin/Settings.json
    

    Git will:

    • open and read the top level working tree directory;
    • for each file, ignore it (*);
    • for each directory, not ignore it (!*/);
    • find the Website directory, hence open and read it;
    • for each file in Website/, ignore it (*);
    • find the directory bin and not ignore it (!*/);
    • open and read the Website/bin directory;
    • find each file and ignore it (*) except for Website/bin/Settings.json.

    The downside to this three-line version is that, during the above processing, Git will open and read every directory, including every subdirectory of every directory, so if there is a top-level tmp directory containing one billion files (directly or after recursing), Git will spend time checking every single one of them. That is, !*/ completely defeats the "don't bother looking here" optimization that saves so much time in some cases.

    What would be nice is if Git's exclusion code were smart enough to realize that if you write:

    *
    !Website/bin/Settings.json
    

    it should automatically register !/Website/ and !/Website/bin/ into its exclusion list if those aren't already present. This seems pretty straightforward to do. (Precisely how to do the negation and anchoring depends on the internal data structures here, which I have not looked at in more than ten years...)