OK. Mac OS.
alias gcurl
alias gcurl='curl -s -H "Authorization: token IcIcv21a5b20681e7eb8fe7a86ced5f9dbhahaLOL" '
echo $IG_API_URL
https://someinstance-git.mycompany.com/api/v3
Ran the following to see: list of all orgs that a user has access to. NOTE: to a new user (passing just $IG_API_URL here will give you all the REST end points that one can use).
gcurl ${IG_API/URL}/user/orgs
Running the above gave me a nice JSON object output which I plunged into jq
and got the info and finally now I have the corresponding git url that I can use to clone a repo.
I created a master repo file:
[email protected]:someorg1:some-repo1.git
[email protected]:someorg1:some-repo2.git
[email protected]:someorg2:some-repo1.git
[email protected]:someorgN:some-repoM.git
...
....
some 1000+ such entries here in this file.
I created a small oneliner script (read the lines one by one - I know it's sequential but) and ran git clone , which works fine.
What I hate and trying to find a better solution is:
1) It's doing it sequentially and it's slow (i.e. one by one thing).
2) I want to clone all repos under the max time it will take the largest repo to clone. i.e. if repo A takes 3 seconds, B takes 20 and C takes 3 and all other repos take under 10 seconds, then I'm wondering if there's a way to clone all repos quickly under 20-30 seconds (versus 3+20+3+...+...+... seconds>minutes which would be a lot).
To do the same, I tried my mind's poverty ran the git clone step in background so that I can iterate faster enough to read those lines.
git clone ${git_url_line} $$_${datetimestamp}_${git_repo_fetch_from_url} &
Hey, the script ended quickly and running ps -eAf|egrep "ssh|git"
showed something fun was running. Coincidently one of the guy shouted :) that Incinga is showing cool metrics for something very high. I thought it was due to me, but I guess I could do N no. of git clones from my GIT instances without impacting any network outage / something weird.
OK, things ran successfully for sometime and I started seeing bunch of git clone output on my screen. On the second session, I saw folders were getting populated just fine, until I finally saw what I was expecting not to:
Resolving deltas: 100% (3392/3392), done.
remote: Total 5050 (delta 0), reused 0 (delta 0), pack-reused 5050
Receiving objects: 100% (5050/5050), 108.50 MiB | 1.60 MiB/s, done.
Resolving deltas: 100% (1777/1777), done.
remote: Total 10691 (delta 0), reused 0 (delta 0), pack-reused 10691
Receiving objects: 100% (10691/10691), 180.86 MiB | 1.57 MiB/s, done.
Resolving deltas: 100% (5148/5148), done.
remote: Total 5994 (delta 6), reused 0 (delta 0), pack-reused 5968
Receiving objects: 100% (5994/5994), 637.66 MiB | 2.61 MiB/s, done.
Resolving deltas: 100% (3017/3017), done.
Checking out files: 100% (794/794), done.
packet_write_wait: Connection to 10.20.30.40 port 22: Broken pipe
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
I suspect you're exhausting resources either on your local machine or on the remote machine by starting ~1000 processes at once. You probably want to limit the number of processes started. One technique for that is to use xargs
.
If you have access to GNU xargs, it might look something like this:
xargs --replace -P10 git clone {} < repos.txt
-P10
is "10 processes"--replace
- replace the {}
with the mapped argumentIf you're stuck with crippled BSD xargs
such as on osx (or want higher compatibility) you can use the more portable:
xargs -I{} -P10 git clone {} < repos.txt
This form will also work with GNU xargs as well