Search code examples
gitperformancegithubgit-clonegit-lfs

Faster alternative to git lfs clone from remote GitHub repositories?


Objective

I have a remote GitHub repositories, which uses git-lfs to hold large binary files.

  • I want others to be able to quickly download my code and data.
  • If speed can be enhanced, I don't expect others to necessarily version control their copies of the repository with git.
  • Preferably, I want to know the reason of being slow or being fast.

Baseline approach (git lfs clone)

As a test of how others will download my repository, I ran the following command on a high performance login node (with 72 Intel Xeon CPUs) on a Linux cluster, using a gpfs disk, and with these versions of git and git-lfs.

  • git version 2.10.2
  • git-lfs/2.3.4 (GitHub; linux amd64; go 1.9.1; git d2f6752f)
$ time git lfs clone --progress [email protected]:PackardChan/chk2019-blocking-extreme.git
Cloning into 'chk2019-blocking-extreme'...
remote: Enumerating objects: 138, done.
remote: Counting objects: 100% (138/138), done.
remote: Compressing objects: 100% (114/114), done.
remote: Total 138 (delta 20), reused 138 (delta 20), pack-reused 0
Receiving objects: 100% (138/138), 148.16 MiB | 36.59 MiB/s, done.
Resolving deltas: 100% (20/20), done.
Git LFS: (64 of 64 files) 7.29 GB / 7.29 GB                                                              

real    4m51.156s
user    7m14.044s
sys 0m28.360s

This took near 5 minutes even in a high performance node. And I noticed that the last line of output reaches the total of 7.29GB only in 36 seconds. The rest of the time is running git update-index -q --refresh --stdin (from what I learn from top -c command).

I therefore believe the performance can be substantially improved if update-index can be skipped. As mentioned in "Objectives", if speed can be improved, I don't mind giving up git version control.

Other unsuccessful attempts

  1. svn export

Inspired by this post, I tried:

time svn export https://github.com/PackardChan/chk2019-blocking-extreme/trunk z4svn

But the lfs files are not correctly downloaded. This is also reported here.

  1. git archive

However, GitHub doesn't support git-archive.

  1. --depth=1

I tried, it didn't perform better. This is understandable as my repository only has one commit.

I am rather new to git. So, am I missing anything?


Solution

  • I'm answering my own question. It turns out that the problem is I didn't run git lfs install to setup ~/.gitconfig.

    git lfs install [options]

    Perform the following actions to ensure that Git LFS is setup properly:

    • Set up the clean and smudge filters under the name "lfs" in the global Git config.
    • Install a pre-push hook to run git lfs pre-push for the current repository, if run from inside one. If "core.hooksPath" is configured in any Git configuration (and supported, i.e., the installed Git version is at least 2.9.0), then the pre-push hook will be installed to that directory instead.

    After that, I have 4 more lines of configuration reported from git config --list.

    filter.lfs.clean=git-lfs clean -- %f
    filter.lfs.smudge=git-lfs smudge -- %f
    filter.lfs.process=git-lfs filter-process
    filter.lfs.required=true
    

    Now the same command of time git lfs clone --progress [email protected]:PackardChan/chk2019-blocking-extreme.git takes only around 1 minute.