Search code examples
gitfilterclonecheckoutsparse-checkout

Git clone and download only the files that need changed and committed


We have a repository with several large sql files, ranging from 100MB to 10GB in size.

Been trying to setup local cloning so we only download the sql file(s) that need changed and committed at any given time, instead of downloading all the sql files, even if we only need one.

I've been able to get close with the following commands. It works up until I commit my changes, at that point it downloads all the files in the current branch.

git clone --filter=blob:none --depth 1 -n --sparse <url>
cd <repoDir>
git sparse-checkout set <fileNeedingChangedAndCommitted>
git restore --staged .
git restore <fileNeedingChangedAndCommitted>
# At this point, the file I need to change is downloaded locally, ready for changes.
# Make changes to file.
git add <fileNeedingChangedAndCommitted>
git commit -m "test"
# At this point, all other files in current branch are downloaded, even if not changed.

I feel like this should be possible, but maybe I'm misunderstanding the concept of sparse-checkout or missing a step/detail.

Is there any way to download only the files you want to change, and then commit those changes without downloading every file in the current branch?

EDIT: From my testing and the chatter on this question, came to the conclusion that this isn't possible with Git. However, we decided to keep the SQL files in their own orphaned branches in the same repo, so they each have their own commit history/chain but are in the same repository for organization. This allows us to checkout only the branches/files we need at any given time, and make changes/commits without downloading all the blobs/hashes of the other sql files. This won't work for every situation, but solves our requirement for now :)


Solution

  • I feel like this should be possible, but maybe I'm misunderstanding the concept of sparse-checkout

    The problem is not in sparse checkout, the problem is in --filter=blob:none. This filter prevents downloading all object at the clone time but Git downloads the necessary objects later when they're accessed.

    Is there any way to download only the files you want to change, and then commit those changes without downloading every file

    Most probably no. Virtually Git stores a copy of the entire working tree in an every commit. I said "virtually" because technically Git does all its best to never store copies, instead it saves pointers to existing objects. To construct a commit Git needs all trees and blobs from the previous commit so that's what it downloads. With sparse checkout but without filter Git would have all necessary objects and wouldn't download anything; but everything must be pre-downloaded.

    The bottom line: you can download and use locally as little as possible. But once you gonna commit Git will need all objects. So either you tolerate Git downloading required objects or allow Git to pre-download everything by removing filter: git clone --depth 1 -n --sparse <url>