Search code examples
githuborganizationdiscoverability

What is a good way to allow the wider discoverability of private GitHub repositories?


If you are in an organisation, there may be GitHub repositories that are private (i.e. you don't have access to them), but it would be useful to know that they exist, and then you could arrange access where appropriate.

In other words we are trying to enable discoverability, in a way that can lead to access. This could be done with sharing readme's (noting that people need to have some discipline to write sensible readme's).

This blog post Solving the innersource discoverability problem looks like a potential solution, but may require that the user has access to see all the repos in the portal? I'd like for the user to be able to view readme's for all repos - if they don't have access, the can contact whoever is listed on the readme.

I see another option for making a file public from a private repo (using gitexporter to create a public repo with only the readme, example here. This makes it public, not my first preference, and would require every repo to do some work, far from ideal. While it doesn't give a neat portal, it should allow GitHub search functionality to find it by topic or keyword?

A related, perhaps simpler option is proposed here, where a student shares a readme from a private repo as a public GitHub page. Again, requires a little work from every repo, no neat portal, but can be found with GitHub search? While public Github pages can be made private, then would only be visible to those with repo access?

So, if I'm summarising basic requirements:

  • All org repos (public, private or team) have a readme that can be accessed by search by someone in the org (preferably not requiring each individual to modify their repo).

Additional nice to have features:

  • All readmes can be viewed in a portal with search
  • Bonus for being able to make super private (only collaborators can see readme - flag in readme?), org private (only people in the org - default) and public (flag in readme?).
  • Simple to implement!

Suggestions?


Solution

  • I think you have already provided a suitable solution for it here already within your question. Alternatively, you can use APIs (GET repos, GET README of a repo) to get each repositories README and save it to a database/JSON based on a cron scheduler and create a web interface based on that data.

    But, I'm gonna elaborate on a few areas of improvement. The problem I see with this is the nature of the search. We aren't always looking for keywords, sometimes we are trying to find a potential fuzzy match for our problem, especially in the case of a larger organization with more than a couple of thousand repositories. In those cases, a search engine implementation will provide much better results. In my opinion, we should collect the README and FAQs and put them into Elastic search, expose search API for queries. The collection of README and FAQs should be part of the CI/CD pipeline, and while pushing new versions to artifactory it must publish metadata as well.