Search code examples
gitversion-controlgit-indexgit-assume-unchanged

Undo git update-index --no-assume-unchanged (not working) and be able to see these local changes again


Update: My permissions issues somehow fixed themselves arbitrarily as now the exact same remote branch create command works, so (as you probably guessed) it wasn't caused by this tracking issue.

Now I'm just looking to fix the tracking issue. Thanks!

Original: I initially applied the following across several files:

    git update-index --assume-unchanged filename.py

referencing this documentation:

https://git-scm.com/docs/git-update-index#_using_assume_unchanged_bit

to hide certain files from my working directory changes (As adding files to the .git/info/exclude did not seem to be having an effect - the file was still visible despite the path being correct and it not having been committed).

I then encountered some issues pushing to a remote branch which hadn't been there previously, the standard

    fatal: Could not read from remote repository.

    Please make sure you have the correct access rights
    and the repository exists. 

So, given I'd had no access problems right before running the update-index, I decided to revert that --assume-unchanged since I hadn't done any other messing about with my git repo.

I was able to view these assume-unchanged files using

    git ls-files -v | grep '^[a-z]'

I tried to revert the assume-unchanged by using

    git update-index --really-refresh

as directed here on reverting assume-unchanged across multiple files:

Undo git update-index --assume-unchanged <file>

This did not restore them, and I then tried

    git ls-files -v | grep '^[a-z]' | cut -c 3- | tr '\012' '\000' | xargs -0 git update-index --no-assume-unchanged

as per this recommendation:

Undo git update-index --assume-unchanged <file>

The files are still invisible and I am now also no longer able to view them by running

    git ls-files -v | grep '^[a-z]'

I also experimentally attempted this on one of my files:

    git update-index --skip-worktree file.py

as per here

https://git-scm.com/docs/git-update-index#_skip_worktree_bit

as I know skip-worktree takes precidence over assume-unchanged so I was hoping to see some state change to that file. Still nothing.

How do I restore the files back to being visible and changes tracked in my working directory?

And as a bonus question, any idea how this could possibly have affected my access rights to create a new remote branch?


Solution

  • You have two different problems here. Your fatal: Could not read from remote repository error is quite independent of any flag bits you have set, or cleared, in your index. It implies that either:

    • for whatever protocol you're using to access another repository (http, https, ssh://, or git://), the connection to the other Git failed, or
    • the connection succeeded but then the other host that supposedly holds this other Git repository said "I see no Git repository here" (at the rest of the URL).

    If the connection itself failed, you should typically see additional information before the fatal: line telling you what happened, e.g., unable to resolve host name to IP address, user name and/or password required but missing, and so on. Use that to diagnose bad connections. If the connection succeeds, but the host says "no Git repo here", use whatever information you can get about the host to find out where the repository went (e.g., can you log in to the host directly and poke around?).


    Now, as to the assume-unchanged and skip-worktree bits, I'm afraid this gets a little complicated and technical. You will need to know some of this to use Git, so it's worth working through. I also note here that the advice to use --really-refresh is basically wrong. That does temporarily ignore the assume-unchanged bit for the purpose of updating the cached stat data in the index, but has no effect on the actual assume-unchanged bit. The way to clear that bit is indeed to use --no-assume-unchanged, as in the fancy pipeline you quoted above, that ends with | xargs -0 git update-index --no-assume-unchanged. The same holds for the skip-worktree bit, except that you clear it with --no-skip-worktree.1

    What is the index and what are these git update-index commands doing?

    It's important to realize, when working with Git, that you have three of what I like to call active copies of each file at all times.2 One of these three copies is whatever was in whichever commit you checked out. This copy is strictly read-only. You cannot change it: it's frozen into a commit and will stay there, in that commit, as long as that commit itself exists—essentially, forever.3 That means that the frozen copy is safe: nothing you do will clobber it; you can always get it back. To view the frozen copy of a file named path/to/file.ext, use git show HEAD:path/to/file.ext. This HEAD:file syntax works with git show and lets you see the frozen copy in the current commit. (You can see the frozen copy in any other commit, too.)

    Now, these frozen copies are in a special, read-only, Git-only, compressed format. None of the other programs on your computer can access them directly (unless they've learned way too much about the insides of Git). And, since they can't be changed, they are fine for archival but quite useless for getting any actual new work done. So when you git checkout some particular commit, Git extracts all the frozen files, turning them back into normal everyday files, which you can see and work with as usual. You can change these ordinary read/write files, or do anything you want with them, including adding new ones and removing existing ones, all in the usual way that you do anything with your computer.

    These usable, workable files are in what Git calls your work-tree. That's where you get your work done. That's the second copy of every file. You don't have to do anything special here: these are just files. If you want to look at the file named file, use whatever tools (editor, file viewer, etc) that you always use to look at the file named file. Its name is just file.

    The third copy of every file is where Git gets sneaky. This is where the index comes in. The index is also called the staging area, or sometimes—rarely these days—the cache. These are all names for the same thing. The third copy of every file is in the index, and you can view the index copy of the file named file using git show :file. That is, git show takes the colon in front to mean: show me the copy that's in the index. The git ls-files command you were using in the fancy pipeline also lists what's in the index.

    The third copy of the file is in the format that Git uses for permanent frozen file storage, but is not quite frozen. You can, at any time, overwrite it. You can replace the index copy of file with any new content you like. You do this with git add, which take the work-tree copy—presumably you've changed that copy at this point—and replaces the index copy with that version. Or, you can, if you like, remove the index copy of file, using git rm, which removes both the index copy and the work-tree copy.

    Technically, what's in the index is just a lot of cache data about the file, plus a bunch of flags, plus a reference to a stored frozen-format copy of the file. When you first check out some commit, so that your HEAD and work-tree copies match, your index copy really just re-uses the frozen HEAD copy directly, so this takes no extra space at all. When you use git add to overwrite it, Git takes the work-tree copy, compresses that down into a frozen, ready-for-permanent-storage copy, and puts that copy somewhere4 and updates the index reference.

    This is one of the secrets that makes git commit so fast. Git does not have to look at your work-tree. It does not have to re-compress all your files. They're already there, in your index, ready to go. All git commit has to do is package up the pre-frozen files into a commit. It just commits whatever is in your index at the time you run git commit. Hence, a good way to think of this is that the index is your proposed next commit. What git add and git rm do is update your proposed commit. Running git commit just snapshots what's in your index—in the staging area, ready to be committed—even if that's mostly the same as the previous commit. The git add and git rm commands are what actually update the index.

    This is why and how each commit is a full snapshot of every file. Any file that you don't update is still in the index ("on stage") and will be in the next commit.

    git status uses the flags

    Suppose you have 3000 files in your currently-checked-out commit (and hence in your work-tree), and you change one in your work-tree and git add it to get it updated in your index / staging-area. If you run git status now, git status doesn't bother to tell you that 2999 of your 3000 files are the same, as that's not useful information. What git status tells you is that the one file is updated.

    The way git status does this, in principle at least, is to run two separate comparisons:

    • First, git status will compare each file in the HEAD commit to the copy in the index. For every file that is the same here, it says nothing. But if the file is different here, it says: staged for commit.

    • Next, git status will compare each file in the index to the copy in the work-tree. Again, if the files are the same, it says nothing. If the files are different, it says: not staged for commit.

    When git status does this comparing, the first part goes pretty fast, due to Git's internal representation of file-contents as blob hash IDs. So it really, literally, just compares every file. It only takes a few milliseconds to decide that 2999 out of 3000 files are the same and one is different. But the second part is slow: actually comparing all 3000 files could take several seconds!

    So, git status cheats. This is where the cache aspect of the index comes into play. Each index entry holds a reference to the frozen-format file that's ready to be committed; but it also holds some data resulting from an OS lstat system call.5 Git can do another lstat system call on the file in the work-tree. Under most conditions, the resulting stat data matches up with what Git saved earlier if and only if the file in the work-tree is still the same as the copy that Git has in the frozen-format, as cached by the index entry. If you've modified the work-tree copy, the OS will have updated the stat data as well.

    So, imagine you are git status, comparing each file in the index to its copy in the work-tree, so that you can say not staged for commit if necessary. You could open each work-tree file and read all the way through it, and compare its contents to what you get if you de-compress the frozen index copy. That will tell you if they're the same or different, but wow, that's a lot of work, it might take seconds. But you have this cached stat data, and if you compare the stat data with the result of another lstat, well, that takes far less work and time. So you do that instead. If the lstat results match the cached results, the file must be the same, and you can say nothing and move on to the next file.

    But in fact, each lstat system call is also pretty slow. Sure, it's thousands of times faster than reading through every file, but it could still take hundreds of microseconds. And what if the OS has some really dreadfully slow lstat that takes 3 milliseconds? Doing that on 3000 files, if each one takes 3 milliseconds, will take nine seconds, and that's far too long!

    Git has a flag for that. The --assume-unchanged flag, which is a settable flag in each index entry, tells Git: don't bother calling lstat on this work-tree copy, just assume it matches the cached stat data. It has a second, slightly more powerful flag, --skip-worktree, that achieves the same result. (It's slightly more powerful because some commands, such as git update-index --really-refresh, will ignore the first flag, but not the second one.)

    If you set either bit, operations that would compare the index's cached stat data against real stat data from the work-tree, to tell if the file is really modified, just assume that the file isn't modified. Clear both bits and these Git operations will call stat after all. Then git status should see an update to the file, as long as the stat data that the OS returns is also updated. There are OS-level tricks that defeat that, but you can usually defeat these OS-level tricks using touch:

    touch path/to/file
    

    makes sure that the stat data on path/to/file is now newer than any cached stat data that Git might be holding.

    This picture should be clear enough if a bit complicated: the index / staging area holds cached data about each work-tree file, from a previous lstat system call. If the cached data match what the OS reports on a new lstat call, the index copy must match the work-tree copy. If you set the flag bits, Git doesn't bother doing the lstat call: it just assumes the two sets of data match, so that the index copy matches the work-tree copy, whether or not that's really true. Clear the bits and Git goes back to calling lstat and gets—we hope—an accurate report from the OS.

    This picture is no longer entirely true, as Git now also has the ability to use a file system monitor to avoid calling lstat unnecessarily. But that's a topic for another question entirely.


    1Note that the fancy pipeline assumes that you have LC_COLLATE set to C, in some version of grep that obey the LC_COLLATE flag. That is:

    git ls-files -v | grep '^[a-z]'
    

    may list every file depending on LC_COLLATE. It also lists the --skip-worktree files, but you must unset that flag with a separate git update-index --no-skip-worktree command. This is one reason I wrote git-flagged. (Listing too many files due to grep matching too much is harmless: you'll just invoke some git update-index commands that didn't really need to be run.)

    I have not made my git-flagged script support the new fsmonitor valid / invalid bits. If your system is using fsmonitor, and that's going wrong, you have a bigger problem and perhaps should disable fsmonitor globally, via git config and the core.fsmonitor setting.

    2This assumes a normal (not --bare) repository, and that you have not added additional work-trees using git worktree add. Each work-tree you add with git worktree add gets its own index and work-tree and its own HEAD, so each one gets another three of these active copies.

    3Once you make a commit and it has acquired a particular hash ID, you can use that hash ID to see whether the commit still exists. If it does exist—and it probably does—then the files you froze into it also still exist in that frozen form.

    It's a bit hard to actually get rid of a bad commit. It can be done, so commits aren't really necessarily forever, but that's kind of the way to think of them.

    4The "somewhere" is actually straight into the repository. If you commit this copy of the file, the frozen copy gets used; if not, it's often just leftover junk that Git eventually cleans out. Unless you are constantly extremely short on disk space, you don't need to worry about these things that git fsck will show as dangling blobs. Just let Git clean them out later, on its own.

    5This refers specifically to the POSIX lstat system call, which produces stat data. If your underlying OS doesn't have or use stat data, Git still needs to cache something, and will use some sort of synthetic stat data that needs to be good enough to make the rest of this work.