Search code examples
gitsparse-checkout

Why do excluded files keep reappearing in my git sparse checkout?


I use the GCC git mirror and because I only use the C and C++ front ends I use git's sparse checkout feature to exclude the hundreds of files I don't need:

$ git config core.sparseCheckout
true
$ cat .git/info/sparse-checkout 
/*
!gnattools/
!libada/
!libgfortran/
!libgo/
!libjava/
!libobjc/
!libquadmath/
!gcc/ada/
!gcc/fortran/
!gcc/go/
!gcc/java/
!gcc/objc/
!gcc/objcp/
!gcc/testsuite/ada/
!gcc/testsuite/gfortran.dg/
!gcc/testsuite/gfortran.fortran-torture/
!gcc/testsuite/gnat.dg/
!gcc/testsuite/go.dg/
!gcc/testsuite/go.go-torture/
!gcc/testsuite/go.test/
!gcc/testsuite/objc/
!gcc/testsuite/objc.dg/
!gcc/testsuite/obj-c++.dg/
!gcc/testsuite/objc-obj-c++-shared/

This works for a while, but then now and then I notice that some of those excluded files have returned, sometimes lots of them:

$ ls gnattools/
ChangeLog  configure  configure.ac  Makefile.in
$ ls  gcc/fortran/ | wc -l 
86

I'm not sure exactly when the files reappear, I do a lot of switching to different branches (both remote-tracking and local) and it's a very busy repo so there are new changes to pull frequently.

As a relative newbie to git I don't know how to "reset" my work tree to get rid of those files again.

As an experiment, I tried disabling sparse checkout and pulling, thinking I could enable sparseCheckout again afterwards to update the tree somehow, but that didn't work very well:

$ git config core.sparseCheckout false
$ git config core.sparseCheckout 
false
$ git pull
remote: Counting objects: 276, done.
remote: Compressing objects: 100% (115/115), done.
remote: Total 117 (delta 98), reused 0 (delta 0)
Receiving objects: 100% (117/117), 64.05 KiB, done.
Resolving deltas: 100% (98/98), completed with 64 local objects.
From git://gcc.gnu.org/git/gcc
   7618909..0984ea0  gcc-4_5-branch -> origin/gcc-4_5-branch
   b96fd63..bb95412  gcc-4_6-branch -> origin/gcc-4_6-branch
   d2cdd74..2e8ef12  gcc-4_7-branch -> origin/gcc-4_7-branch
   c62ec2b..fd9cb2c  master     -> origin/master
   2e2713b..29daec8  melt-branch -> origin/melt-branch
   c62ec2b..fd9cb2c  trunk      -> origin/trunk
Updating c62ec2b..fd9cb2c
error: Your local changes to the following files would be overwritten by merge:
        gcc/fortran/ChangeLog
        gcc/fortran/iresolve.c
        libgfortran/ChangeLog
        libgfortran/io/intrinsics.c
Please, commit your changes or stash them before you can merge.
Aborting

So apparently I've got local modifications to files I never asked for and AFAIK have never touched!

But git status doesn't show those changes:

$ git st
# On branch master
# Your branch is behind 'origin/master' by 9 commits, and can be fast-forwarded.
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       libstdc++-v3/53270.txt
#       libstdc++-v3/TODO

I've tried git read-tree -m -u HEAD but it doesn't do anything.

So my questions are:

  • Why do the files reappear?
  • How do I make them disappear again?
  • How do I prevent them coming back?
  • Is this possibly related to the fact my .git/info/exclude file contains references to files in the directories supposed to be excluded (i.e. named with !) in the sparse-checkout file? I followed the instructions to ignore the same files that SVN does

    $ git svn show-ignore >> .git/info/exclude

So my exclude files includes paths such as

# /gcc/fortran/
/gcc/fortran/TAGS
/gcc/fortran/TAGS.sub
/gcc/fortran/gfortran.info*

Which would be below one of the directories named in the sparse-checkout file:

!gcc/fortran/

I've tried to reproduce the problem with a test repo that I clone a few copies of and edit each of them, create/switch/delete branches and merge changes between them, but it never goes wrong in my toy testcases. The GCC repo is a bit big (over 2GB) and the time between "failures" (on the order of a week or two) too long to expect people to try to reproduce the problem exactly. I haven't experimented with having the same paths in sparse-checkout and exclude, as it only occurred to me today there might be a conflict there.

I asked about this on #git on freenode a few weeks ago and IIRC was basically told "it's probably a bug, noone uses sparse checkout" but I'm hoping for a better answer ;-)

Update:

The most recent time I saw the problem actually happen (i.e. the files weren't there, then appeared after a single command) was doing a pull from the upstream origin:

   bac6f1f..6c760a6  master     -> origin/master

and among the changes shown were these renames:

 create mode 100644 libgo/go/crypto/x509/root.go
 rename libgo/go/crypto/{tls => x509}/root_darwin.go (90%)
 rename libgo/go/crypto/{tls => x509}/root_stub.go (51%)
 rename libgo/go/crypto/{tls => x509}/root_unix.go (76%)
 create mode 100644 libgo/go/crypto/x509/root_windows.go

Before the pull the libgo directory was absent, as desired. After the pull that dir was present and these files (and no others) were under it:

$ ls libgo/go/crypto/x509/root_<TAB>
root_darwin.go  root_stub.go    root_unix.go    

I don't know if the renamed files lost their skip-worktree bit, how do I check that?

I'm pretty sure the problem doesn't always happen when there are renames, because e.g. the libgfortran/ChangeLog file shown in an example above is not a new file or recently renamed.


Solution

  • The skip-worktree bit can be modified with git update-index --skip-worktree. When you notice the files present you can check git ls-files -v |grep ^S (S being a file marked with skip-worktree).

    But as the #git folks say, if you see odd behavior it is most likely a bug in git. After all, this is quite esoteric feature. You should probably report your findings to the git mailing list.

    Edit: Also, if you are using git 1.7.7.6, I strongly recommend upgrading. 1.7.10 tree is way ahead, and I think there is a strong chance it will fix your problems.