Search code examples
gitgit-clone

How to clone a list of GIT repositories?


I have a list of 70+ GIT repo URLs(students). Is there any feature that allows me to clone them all at once?

Would there be the same for synchronizing the repository with the server?

If not, I guess I'd need to write a quick shell script in order to do this.


Solution

  • Shell scripting.

    Getting the repos

    The principal idea to get the repos is

    while read repo; do
        git clone "$repo"
    done < repolist.txt
    

    assuming the file "repolist.txt" contains one repo URL per line.

    Updating the repos

    This one is trickier.

    While it's easy to iterate over the list of repos, there's the conceptual problem with "synchronizing". Its essense roots in that when you clone the "normal" way — that is, not specifying different funky command-line options modifying the git clone's defaults — all the branches of the source repo end up being created in the form of the so-called "remote branches" in your resulting local repo. Those remote branches merely track the state of the matching branches in the source repo. A single branch, designated as the "current" in the source repo, is then taken, and a local (that is, yours only) branch is created out of it. That's why when you clone a repo with 100 branches you end up having only a single local branch (which is "master" in 99.9% cases).

    What follows, is that automatic "synchronization" is a moot point here: when you do git fetch origin in a "normally" cloned repo, the remote branches get updated with their new contents and are hence almost1 fully synchronized. Note that your local branches are not touched at all. That's because you might have your local work done on them, and so you have to decide on how do you want to reconcile the updated state of the remote branches with your local branches, if at all. This is just the default work model assumed by Git because that's what needed in most cases.

    If, instead, you don't intend to do any work on the branches of those repos, and they are for inspection only, the easiest approach is to make Git have no remote branches at all.

    To do this, you clone using several explicit steps:

    1. Initialize an empty repository:

      git init <dirname>
      
    2. Configure a remote there:

      git remote add --mirror=fetch origin <url>
      

      The --mirror=fetch tells Git to setup the mapping of what to fetch to what to update with the fetched data in a way which forcefully overwrites all local stuff with the remote stuff.

    3. Fetch all the data — overwriting everything local:

      git fetch -u origin
      

      The -u (or --update-head-ok) permits Git to overwrite the branch pointed to by the HEAD reference. This pulls the rug from the feet of the index and the work tree but we'll compensate for that on the next step.

    4. Force-update the index and the work tree using the new data:

      git reset --hard HEAD
      

      This makes Git overwrite the index and the work tree with the up-to-date state of the branch pointed at by HEAD — typically "master" but should you check another branch out (see below) it will obviously use that one.

    Then, to update the data next time you do:

    git fetch -u origin
    git reset --hard HEAD
    

    and then study what's in the work tree.

    If you need to view another branch, the usual

    git branch -a
    

    …observe the list and pick a branch, then

    git checkout <that_branch>
    

    will work.

    In essense, all this dance with explicit repo initialization and adding of a remote in a special way is needed because the --mirror option of git clone implies creating a bare repository, and we supposedly want a normal one (I think).

    To update all the repos located in a directory, do

    find "$root_dir" -mindepth 1 -maxdepth 1 -type d -print \
        | while read repo; do \
            cd "$repo" && \
            git fetch -u origin && \
            git reset --hard HEAD \
          done
    

    1 The branches deleted in the remote repo are not deleted locally. To do that, you have to run git remote prune origin.