Search code examples
gitgit-diffgit-status

Check for uncommitted changes in another branch x, without checking out branch x


I want to determine if another branch has uncommitted changes, but I don't want to check that branch out. Basically I want to find out if another branch has a "dirty index". This means either staged or unstaged changes that are uncommitted.

Is this possible? From I have learned so far, I don't think you can use git status on any branch other than the one you have currently checked out. But what about git diff?

Update: I just realized that the only branch that can have a "dirty index" is your current branch, unless you use git-stash (I think). So perhaps this question is better formulated as - how can I check if any branch has stashed/uncommitted changes?


Solution

  • Note: this took a while to write, and you updated your question in the meantime. This answer is a lot longer than it might have needed to be.


    I want to determine if another branch [i.e., a branch other than the current branch] has uncommitted changes ...

    The answer is that it doesn't.

    In fact, no branch contains uncommitted changes. It's literally impossible, and asking the question means that you're thinking that Git works in some other way than Git actually does work.

    Basically I want to find out if another branch has a "dirty index".

    There is only one index. Well, that's not quite true, but start with that! Three is also only one work-tree (which is also not quite true, but again, let's start with that).

    ... [update] So perhaps this question is better formulated as - how can I check if any branch has stashed/uncommitted changes?

    Here, you may wish to use the helper routines in $(git --exec-path)/git-sh-setup. View this file in your favorite editor, for instance.

    How Git actually works

    The way Git actually works is a little bit complicated. The main body of any Git repository is a database that contains four types of Git objects, of which the most important for our purposes is the commit. We can think of each Git repository as being composed of a series of commits. Each commit has a unique hash ID—one of those big ugly hexadecimal numbers like b7bd9486b055c3f967a870311e704e3bb0654e4f—that Git uses to extract the contents of the commit object. The object itself is pretty small, but contains references to other hash IDs that eventually get you all the files in the snapshot of that commit.

    Meanwhile, branch names like master simply hold hash IDs. Each branch name holds exactly one hash ID. So a name like master holds an ID like b7bd9486b055c3f967a870311e704e3bb0654e4f. Git calls this the tip commit of that branch. The name-to-hash-ID mapping is a second database, sitting alongside the main hash-ID-to-object database.

    The commit itself contains another hash ID—well, more than one, but let's take a look at this commit. The git cat-file -p command pretty-prints the contents of an object.

    $ git cat-file -p b7bd9486b055c3f967a870311e704e3bb0654e4f | sed 's/@/ /'
    tree 1fd4a47af4942cbdee0bdcb4375612ab521a4a51
    parent 5571d085b3c9c2aa9470a10bcf2b8518d3e4ec99
    author Junio C Hamano <gitster pobox.com> 1531941857 -0700
    committer Junio C Hamano <gitster pobox.com> 1531941857 -0700
    
    Third batch for 2.19 cycle
    
    Signed-off-by: Junio C Hamano <gitster pobox.com>
    

    The tree line gets us the snapshot for that commit. The parent line tells us which commit comes before this commit. The rest of the lines contain the remaining metadata, including the name of the person who made the commit, and the log message.

    Because this stuff is in a commit, it's all committed. There aren't any changes either: the tree listed here holds a complete snapshot of all the files. We can view the tree directly, again using git cat-file -p, but it's long and kind of boring:

    100644 blob 12a89f95f993546888410613458c9385b16f0108    .clang-format
    100644 blob 1bdc91e282c5393c527b3902a208227c19971b84    .gitattributes
    [snippage]
    100644 blob 536e55524db72bd2acf175208aef4f3dfc148d42    COPYING
    040000 tree 814822a9a0a75e17294704b37950c30361401a85    Documentation
    [lots more snippage]
    100644 blob ec6e574e4aa07414b9a17bb99ddee26fd44497de    xdiff-interface.c
    100644 blob 135fc05d72e8f066a63902785d12485a656efa97    xdiff-interface.h
    040000 tree 12edaa3d770f84e31ee58826eea93ea6ca64d939    xdiff
    100644 blob d594cba3fc9d82d94b9277e886f2bee265e552f6    zlib.c
    

    If we follow each of the sub-tree lines, we eventually collect up the entire snapshot for commit 1fd4a47af4942cbdee0bdcb4375612ab521a4a51 in the Git repository for Git. Each blob line gets us the hash ID of a file that goes with that commit, while the name of the file is at the end of the line. So this commit in Git has a version of zlib.c that has hash ID d594cba3fc9d82d94b9277e886f2bee265e552f6, and we can view that file as well with git cat-file -p if we want:

    $ git cat-file -p d594cba3fc9d82d94b9277e886f2bee265e552f6 | head
    /*
     * zlib wrappers to make sure we don't silently miss errors
     * at init time.
     */
    #include "cache.h"
    [rest snipped]
    

    These are all frozen / read-only

    Everything stored in the main repository body, under these hash ID keys, is completely, entirely, 100% read-only. The reason for this is simple: the hash ID key is actually a cryptographic checksum of the data! (Well, of the data plus a very small header giving the object's type and size.) When you give Git a hash ID as a key, and Git uses that to extract the object from the repository database, Git does a consistency check: it re-checksums the retrieved data to make sure it matches the hash ID it used in the first place. These two must match: if they don't, Git knows that the object has become corrupted somehow, e.g., due to disk failure, or something along those lines.

    Names find tip commits; commits find their parents, to make a backwards chain

    As we noted above, a name like master contains the commit hash ID of the tip commit of that branch. If we use the hash ID to retrieve the commit itself, we get a parent line. The parent tells us the hash ID of the commit that was the tip of the branch, at some earlier time. We say that each of these things—the branch name, and the commit itself—points to a commit, and we can draw these pointers as arrows:

    ... <-parent  <-tip   <--master
    

    Git can start at the tip commit, and do something with that commit. Then Git can use the parent ID to find the previous commit, and do something with it. The parent is its own parent, and Git can look at that commit, and so on. Eventually the entire chain leads back to a point where the action stops, typically at the very first commit ever made. If there are two branch names, with different commits at their branch tips, in a pretty small repository, we can draw out the entire thing like this:

    A--B--C--D   <-- master
           \
            E--F--G   <-- develop
    

    The internal linkages always point backwards, from commit to parent. Since the objects themselves are read-only, no commit remembers its children: those are created too late. But all children always remember all of their parents. When we make a merge commit, such as merging G into D, the merge remembers both of its parents:

    A--B--C--D------H   <-- master
           \       /
            E--F--G   <-- develop
    

    and the same thing happens as always happens whenever we make a new commit: the current branch name changes, so that master points to H instead of D. The first parent of a merge commit, and the only (and thus first) parent of an ordinary non-merge commit, is the commit that was the tip just a moment ago.

    To put it another way: Branch names move; commits stay put.

    Everything committed is in a special, Git-only format

    The blob objects, the things that contain the files, that we saw above, identified by hash ID, are in a special, Git-only, compressed format. (The commits and trees are also compressed, though this is less important, since only Git really uses those directly.) This is no good for our own use, though, so we need a place where Git can extract files into their ordinary uncompressed format. These files also need to be changeable, and the Git-only frozen blob objects are not.

    Hence, each repository generally comes with one (1) work-tree, into which Git extracts and decompressses committed files. The initial contents of the work-tree come out of one of the frozen commits.

    Git does, of course, need a way to make new commits. There is a similar version control system (Mercurial) that makes new commits using the work-tree, and Git could have done that, but Git does not do that. Git has, instead, this thing called, variously, the index, or the staging area, or the cache. The role of the index can get fairly complicated—Git expands it quite a lot during conflicted merge operations, for instance—but it's perhaps best described as the place you build up the next commit you will make. This is its role as staging area.

    New commits are complete snapshots, not just changes! So the index / staging-area holds all the files from the current commit. Git simply copies the current commit to the index, which makes the index contain all the files. The files in the index are still in the special Git-only format, but—this is the crucial difference from the committed copies—they are now writable.

    We can now observe the process of checking out a commit and making a new commit

    Let's start with a six-commit repository. In this repository, let's add the remote-tracking names (origin/*) as well, and attach the name HEAD to master:

    A--B--C--D   <-- master (HEAD), origin/master
           \
            E--F   <-- origin/develop
    

    Now let us do this:

    $ git checkout develop
    

    (and let's assume that the repository has just been cloned and has everything "clean"). The branch name develop does not exist yet! Rather than failing, git checkout actually creates it right now using origin/develop as the hash ID, so that the name develop springs into being, pointing to commit F.

    The next step can be quite complicated (see Checkout another branch when there are uncommitted changes on the current branch) but we're assuming everything is clean, and we can therefore simplify it further: Git takes the contents of the tree in F, puts that into the index, and makes the work-tree match by removing any files that were in it that shouldn't be, and making all the other files match those in the index, but de-compressed. Git attaches the name HEAD to develop. Our commit-graph drawing is unchanged, except for where HEAD is attached to the new name develop:

    A--B--C--D   <-- master, origin/master
           \
            E--F   <-- develop (HEAD), origin/develop
    

    We can now happily modify files in the work-tree. After we are done with that, we run git add on whichever ones we changed. This copies the work-tree files into the index, compressing them and making them ready for committing:

    $ [edit various files]
    $ git add -u             # or -A or `.` or list the files or whatever
    

    Now that the index matches the work-tree, we run:

    $ git commit             # with -m to avoid using the editor, or whatever
    

    Git packages up the index's contents (as a tree with subtrees as appropriate), freezing all the pre-compressed files into a new tree, and makes a new commit. The new commit's parent is F, because HEAD is attached to develop and develop contains F's hash ID. The new commit's tree is the tree Git just packaged up, the author is "us", and so on. Writing out the commit produces a new, unique hash ID for our new commit G. Git writes that hash ID into the name develop, and now we have:

    A--B--C--D   <-- master, origin/master
           \
            E--F--G   <-- develop (HEAD), origin/develop
    

    and our index and work-tree match each other and match commit G, since G was just made from our index.

    All other items build from here

    In an edit or comment, you mentioned using git stash. What git stash does is to make commits. The thing that is special about the commits that git stash makes is that they are on no branch.

    It actually makes at least two commits, one for the current index content, and one for the work-tree content, in case you have staged (with git add) some stuff and not-staged (by not adding) more stuff. Both of these commits are full snapshots! I like to draw them as i and w:

    A--B--C--D   <-- master, origin/master
           \
            E--F--G   <-- develop (HEAD), origin/develop
                  |\
                  i-w   <-- refs/stash
    

    Git finds the w commit through the special name refs/stash (which is not a branch name—branches are in refs/heads/, e.g., refs/heads/master and refs/heads/develop). The w commit finds the i commit, and also the commit on which the stash was made (G); the i commit also links back to the commit that was current when you made the stash.

    Having made the two (normal stash) or three (--include-untracked or --all) stash commits, git stash runs git reset --hard to clean out the index and work-tree, making them match the current commit. There are more options here as well, but that covers the basic operation of git stash.

    git worktree add, and other special cases

    Using git worktree, we can create new, additional work-trees that go with the current repository. Each added work-tree has its own index. The index for each work-tree caches (hence the name cache) a lot of data about that work-tree, and Git uses that to quickly scan through, or even avoid scanning, the work-tree: this is how and why the index indexes the work-tree, hence the name index.

    Besides all of these, you can create your own temporary index files! This is how git stash works, for instance. First it makes a mostly-ordinary commit i of the current index, which is easy since Git already knows how to do that, but then it has to commit the work-tree. Git can only build new commits from an index, so what git stash does is to create a temporary index, stuff all the work-tree files into it, and use that to make the w commit.

    A number of other fancy Git tricks can make use of temporary or alternate index files. One must use care with these, since Git in general assumes that the index indexes / caches the work-tree—or that the added, per-work-tree index caches that particular added work-tree—and if you get things out of sync, some interestingly subtle failures occur.