Search code examples
gitversion-controlgit-cloneinternalsshallow-clone

Why does `git clone --depth 1` leave packfiles?


I am trying to clone a repository with a lot of blobs in its history, and would like to only download the files at a specific commit without any added overhead or redundancy.

When trying git clone --depth 1, the .git directory becomes quite large. This appears to be because of a large packfile, whose file size corresponds to the size reported by git when it is Receiving objects:. Inspecting the packfile with git verify-pack suggests it contains a large amount of blob information.

However, trying git clone --filter=blob:none still results in a similarly large packfile listing blobs.

My expectation would be that --depth 1 shouldn't be downloading any history, and filter=blob:none shouldn't be downloading any blob history.

So why is my .git directory being populated with packfile overhead for a shallow clone?

I am wondering if this is perhaps the initial compressed download of the single commit I checked out - but even so, how can I prevent this redundant file from persisting?

For specific reference, the repository I am cloning is ARM-software/CMSIS_5.

This started out as a question about shallow submodules and only downloading files at a specific commit without overhead, but the packfile overhead appears to pertain to cloning in general so I figured I'd start here.


Solution

  • Inspecting the packfile with git verify-pack suggests it contains a large amount of blob information.

    When you run...

    git clone --filter blob:none --depth 1 https://github.com/ARM-software/CMSIS_5
    

    ...you are still checking out a working copy from the repository. The repository in question has over 3000 files in it:

    $ find * -type f -print | wc -l
    3225
    

    Since file content is stored in blobs, that means that regardless of the blob:none filter, git will still need to transfer the blobs that correspond to the files in the HEAD commit, so we would expect to see a similar magnitude of blobs in the packfiles. And indeed after running the above command, we see:

    $ git verify-pack -v .git/objects/pack/pack-b0279f34420775288c089456dfc84f2697570837.pack |
      grep blob | wc -l
    2807
    

    If you don't check out a working copy (e.g., you clone with --bare), the resulting repository will not contain any blobs:

    $ git clone --bare --filter blob:none --depth=1 https://github.com/ARM-software/CMSIS_5/
    $ find CMSIS_5.git/objects/pack/ -name '*.pack' | xargs -n1 git verify-pack -v | grep blob | wc -l
    0