Search code examples
githubplone

Counting and analyzing commits in Github organization (not repo)


I'd like to count the commits of 2012 in http://github.com/plone and http://github.com/collective

Are there any tools to do this - provide statistics for Github organizations?

Do I need to write my own script to scrape the repositories, check out them individually and count commits?


Solution

  • Here is how I'd do it:

    • use the GitHub API to enumerate the repositories (see the JSON for Plone for an example). Loop over the JSON result and with each:
      • Check out the repository (the git_url URL) with git clone --bare; only the git info, no working copy. This creates a <repository_name>.git> directory, say plone.event.git if you cloned git://github.com/plone/plone.event.git.
      • Count the revisions with git --git-dir=<git_directory> rev-list HEAD --count; outputs the count to stdout, so subprocess.check_output() should do the job just fine.
      • Remove the .git directory again

    That only requires 2 API calls, so you avoid being rate limited; paging through all the commits with the API would require too many requests to count all the repository commits, checking out a bare repository copy would be faster anyway.