Search code examples
gitclonesparse-matrixcheckout

git sparse-checkout only root files in ropos subdir?


I have created a git repo, added the following file structure, and pushed to remote:

test
├── dir1
│   ├── file1.txt
│   └── file2.txt
└── dir2
    ├── file1.txt
    ├── file2.txt
    ├── file3.txt
    └── file4.txt

So lets say dir1 contains project documentation and initial zip files and generally stuff I don't want to clone. I would like to perform a sparse clone and end up with the following file structure:

test_sparse
├── file1.txt
├── file2.txt
├── file3.txt
└── file4.txt

So I perform the following on my local machine:

mkdir test_sparse
cd test_sparse
git init
git config core.sparsecheckout true
echo dir2/* > .git/info/sparse-checkout
git remote add -f origin <url>/git/test
git pull origin master

But I end up with the following file structure:

test_sparse
└── dir2
    ├── file1.txt
    ├── file2.txt
    ├── file3.txt
    └── file4.txt

What do I need to write to .git/info/sparse-checkout so only the files (and potential subdirs) in dir2 are cloned and not the actual dir2?

Thank you


Solution

  • That's how sparse checkout works, it keeps the locations the same, but extracts only the parts you want. The answer to your question as asked is, sparse checkout will not restructure your history, it will not hide the existence of those other parts, it just won't put their content in the work tree for your inspection.

    Git doesn't come with a lot of newbie-user convenience commands for casual structure surgery. Git can do it, it's built for that, there's power tools galore and they're not particularly difficult to use, for a starter kit look at git read-tree -um and git merge -Xsubtree or git merge -Xsubtree=dir2 to maintain a locally-sliced history, which Git will handle just fine.

    But the administrative overhead of so completely hiding the existence of stuff you're not working with at the moment generally seems to exceed any savings, so Git's convenience commands don't go there. Just do git sparse-checkout set 'dir2/**' and be done with it. If some slice you're after is so completely severable that everything else is often treated as entirely separate histories, the Git way is to make them entirely separate histories. Nothing says you can't have disjoint histories in a repo and check them out / combine them into any structure you like. But Git gets its speed and efficiency from knowing where those boundaries are, so to make everything fast and efficient the way Git and it users like things, tell Git what you're doing up front.