Search code examples
githubweb-crawlerarchiving

List all public gitHub repositories as links


I need an index-page, that shows links to all gitHub repositories.

I think that is the reason, why many repos are not found by crawlers like the Waybackmachine

I think if there was such a site with a high ranking, they would start crawling it

The developer site sais, there is an Api for getting all repos


Solution

  • Warning: GitHub hosts a huge number of repositories. You'll have to take this into account when designing your index.

    I can think of a few options:

    • The legacy GitHub search API. You'll have to cope with the API rate limit though.
    • This StackOverflow answer could be a good start to get a rough grasp of the number of repos per language.
    • Leveraging the GitHub Archive project which records the public GitHub timeline. (Note: As the project only exposes events back from February 12, 2011, you won't get any data about repositories showing no activity since this date.)