I am trying to clone a repository with a lot of blobs in its history, and would like to only download the files at a specific commit without any added overhead or redundancy.
When trying git clone --depth 1
, the .git
directory becomes quite large. This appears to be because of a large packfile, whose file size corresponds to the size reported by git when it is Receiving objects:
. Inspecting the packfile with git verify-pack
suggests it contains a large amount of blob information.
However, trying git clone --filter=blob:none
still results in a similarly large packfile listing blobs.
My expectation would be that --depth 1
shouldn't be downloading any history, and filter=blob:none
shouldn't be downloading any blob history.
So why is my .git
directory being populated with packfile overhead for a shallow clone?
I am wondering if this is perhaps the initial compressed download of the single commit I checked out - but even so, how can I prevent this redundant file from persisting?
For specific reference, the repository I am cloning is ARM-software/CMSIS_5.
This started out as a question about shallow submodules and only downloading files at a specific commit without overhead, but the packfile overhead appears to pertain to cloning in general so I figured I'd start here.
Inspecting the packfile with git verify-pack suggests it contains a large amount of blob information.
When you run...
git clone --filter blob:none --depth 1 https://github.com/ARM-software/CMSIS_5
...you are still checking out a working copy from the repository. The repository in question has over 3000 files in it:
$ find * -type f -print | wc -l
3225
Since file content is stored in blobs, that means that regardless of the blob:none
filter, git
will still need to transfer the blobs that correspond to the files in the HEAD commit, so we would expect to see a similar magnitude of blobs in the packfiles. And indeed after running the above command, we see:
$ git verify-pack -v .git/objects/pack/pack-b0279f34420775288c089456dfc84f2697570837.pack |
grep blob | wc -l
2807
If you don't check out a working copy (e.g., you clone with --bare
), the resulting repository will not contain any blobs:
$ git clone --bare --filter blob:none --depth=1 https://github.com/ARM-software/CMSIS_5/
$ find CMSIS_5.git/objects/pack/ -name '*.pack' | xargs -n1 git verify-pack -v | grep blob | wc -l
0