Search code examples
gitgitignore

files in .gitignore still show up when doing git status


In my .gitignore (which is in the root of my project repo - the same directory that the hidden .git directory is in) I have just one line:

config.php

When I do git status it shows unstaged changes to that file (in the root directory), which I wouldn't expect.

When I do git ls-files . --ignored --exclude-standard --others nothing is returned.

This sounds exactly like .gitignore Files Still Show Up in Untracked List except in that issue the .gitignore was not in the root directory whereas mine is.

Any ideas as to what's wrong? I'm running Windows 11 Pro.


Solution

  • In general what this means is that config.php is already tracked. A tracked file is never ignored. (In some specific cases, it means your .gitignore file is encoded in UTF-16 or similar, in which case, fix that first, then check again.)

    It's extremely important to understand exactly what a tracked file is, though. A file that is tracked right now won't necessarily be tracked tomorrow. So: what is a "tracked" file?

    All about Git's index

    This is a bit long, but it's worth going through:

    • Git stores commits (not files but rather whole commits). A commit then contains files, but each commit has a full snapshot of every file, de-duplicated within and between commits, so that they don't take up lots of space despite the fact that files repeat over and over from commit to commit.

    • The commits are entirely read-only. Nothing inside a commit can change, ever. (The --amend flag to git commit is a lie: a little white lie, to be sure, but still, a lie.) The files inside the commit are compressed and de-duplicated and in a weird format that only Git can read. As such, we literally cannot work on or with commits. They're strictly archival.

    • Git must therefore extract a commit before we can do any work on it. That's actually entirely normal for almost any version control system: most of them do this "extract a commit to a working area" trick. Git calls the working area your working tree, and it's pretty simple: it's an ordinary directory (or folder, if you prefer that term) on your computer, holding ordinary sub-folders and files in the ordinary way. There's nothing special or Gitty about these! In fact, these files are not in Git at all. Git simply creates them as needed when you extract the commit (and then removes them if appropriate as you switch to some other commit).

    This part is pretty straightforward, and if Git stopped here, we wouldn't get all confused by its weirdness. But Git doesn't stop here. Instead of having just two copies of each file—the frozen-for-all-time version in the commit, that only Git can read, and the usable version in your working tree so that you can get work done—Git keeps three copies of each file:

    • there's the frozen (and de-duplicated) file in the commit;
    • there's a half-frozen—pre-de-duplicated, but not actually frozen—file in Git's index; and
    • there's a usable copy in your working tree.

    It's that middle copy, in between the commit and your working tree, that throws everything for a loop. I should say "copy", because it's pre-de-duplicated, and since it just came out of the commit, it's a duplicate and therefore takes no space.1

    You can't see the index copy directly,2 but its presence is the literal definition of a tracked file. Any file that is in the index right now is tracked. Any file that is not in the index right now is untracked. And that's it! Well, except for the mechanisms by which a file enters or leaves the index, and of course the purpose of a tracked file.

    The index is either so important, or so badly named (wtf does "index" mean anyway?), that this thing in Git actually has three names. I like to use "index" here because it's meaningless, and the index takes on an expanded role during git merge, but when we're not in the middle of a merge, the index has a relatively simple role: The index holds the files you're planning to commit in your next commit. Because this copy of each file can be overwritten—or even removed entirely—you get to copy updated working tree files back into the index.

    Git calls the act of copying a working-tree copy of a file into the index staging the file. Hence, another name for the index is the staging area. You stick the files into the staging area, and now they are "staged for commit".

    In fact, there was already a file there before, also staged for commit—but it was a pre-de-duplicated duplicate of the file that's already in the current commit. So git status didn't mention that file. You'll see it in git ls-files --stage output though, if you're crazy enough to run that command (see footnote 2). The git status command only mentions the file as staged when the index copy is different from the current commit copy. That way you know which files you have changed in your proposed next commit.

    An unstaged file is one where the index copy doesn't match the working tree copy. Since there are actually three copies, it's possible to have all three out of sync: just make a small change, git add the file, and make another small change. Now one file is both staged and unstaged at the same time. That just means the copies differ.

    The third name for this index / staging-area, which you mostly see in flags, is the cache. This shows up in the command git rm --cached for instance.

    We already know that if you want to copy a file from the working tree back into the index—to stage it for commit, whether or not it's a new file—you run git add file. If you'd like to remove a file from both the index / staging-area and your working tree, you can use git rm file.

    But what if you want to remove it from the staging area, but not from your working tree? To Git, this means "remove from index, don't touch working tree". Here the last (oldest and probably worst) name for the index / staging-area shows up because this command is spelled git rm --cached. It removes the index copy only.


    1Any index entry takes a minimum amount of space; typically most take very roughly about 100 bytes. So for every file in the commit, you spend ~100 bytes on the index entry. If you have 1000 files that's about 100 kB, which on a modern drive isn't even noticeable. If you 10,000 files, it's still just about 1 MB, i.e., not noticeable in a 250 or 500 GB SSD, much less on a 4+ TB hard drive.

    2Actually, you can sort of see it, with git ls-files --stage. This shows you the file's name and a Git hash ID. You can't see the file's contents this way, but you can see every file in Git's index. That can be quite a lot of files, and this command isn't meant for normal human use: blasting 1000 or 10000 lines of output at you isn't all that useful, usually.


    But why bother with an index at all?

    Other version control systems don't have this thing. It's not, in any strict sense of version control, required. But Git has it, and you have to bother with it. Why does Git have it? Only Linus Torvalds could say for sure, but we can note this: when you run git commit, Git simply freezes into a new commit all of the pre-de-duplicated files in the index. This usually goes very fast—so fast that some of us programmers, first introduced to Git, thought that it couldn't possibly be working.3

    In short—if it's not too late—the existence of this index / staging-area gives us a place to arrange the next commit, that is separate from the working tree. This frees up the working tree to hold files that aren't in the index at all and won't be in the next commit. Such a file is an untracked file.

    Since Git makes the next commit from the index, and an untracked file is any file that isn't in the index, the untracked files won't get committed. But here's the remaining problem: git status and other Git commands will complain about these untracked files. This is where .gitignore comes in.


    3In other more traditional version control systems, you'd run their equivalent of "commit" and then go for a break, because nothing was going to happen for minutes. Git finished the commit in milliseconds and that was just astonishing. Git can also switch from commit to commit, or branch to branch, far faster than these traditional systems could, and again that involves Git's index. If Linus wasn't using it specifically because it made Git fast, "making Git fast" was a huge bonus, at least.


    .gitignore is misnamed

    The name .gitignore makes it sound like the files listed in it will be ignored. That's not the case: the files that won't be in the next commit, won't be in the next commit because they're currently not in Git's index / staging-area.

    If we check out an existing commit that has some file, a copy of that file goes into Git's index / staging-area, and that file is now by definition tracked. Listing it in .gitignore will have no effect at all. Removing that file from Git's index, with git rm or git rm --cached, will make the file untracked.

    If we have some untracked file, git status will whine about it. If it's supposed to be untracked, it would nice to have git status stop whining. That's the first part of what .gitignore does. It's more .git-dont-whine-about-these-files-when-they-are-untracked.

    But we also have en-masse "add everything" commands. In particular, git add . means add every file in the current directory and all sub-directories. For untracked files that shouldn't become tracked, we'd like this kind of operation to skip them. That, too, is something .gitignore accomplishes. So maybe it should be .git-don't-complain-about-these-files-if-they-are-untracked-and-when-they-are-untracked-dont-add-them-with-an-en-masse-git-add-command-either. But ... well, are you willing to type all that in? 😀 I'm not, so .gitignore it is.

    Note two more things here:

    • .gitignore rules can get pretty complicated, since there's ways to un-ignore things and ways to specify anchored vs unanchored paths and so on. You can put .gitignore files in subdirectories, and those apply to the subdirectories and their children, but not to higher level directories. But mostly it's pretty simple: don't make an untracked file tracked and don't gripe about it as untracked.

    • It's sometimes impossible to tell whether a file is untracked or merely unmodified. If there's a file named foonly in your working tree, and git status says nothing about this file... is it tracked and unmodified? Or is it untracked and ignored? You can't tell from git status alone.

    The git check-ignore command, especially with the -v flag, is good at figuring out whether some file is ignored, and if so, why. And, as a sort of last resort, git ls-files --stage can tell you if a file is in the index right now (consider running git ls-files path/to/file to limit its output). The --stage also gives you the raw staging numbers for a file that's undergoing merge-conflict-resolution, but that's a pretty advanced topic.