Search code examples
requestgraphqlgithub-api

get the top 10 javascript/opensource repositories ranked by star using GitHub GraphQL Api


I would like to get the top 10 javascript/opensource repositories ranked by star (and some related informations) using GitHub GraphQL Api in a python project. I have this query so far:

query{
  search(type: REPOSITORY, query: "language:javascript", first:10) {
    userCount
    edges {
      node {
        ... on Repository {
          name
          url
          stargazers {
            totalCount
          }
          owner{
            login
          }
        }
      }
    }
  }
}

The problem is that it does not always return the same result: it will return 10 random repositories ordered by starcount at each query rather than the absolute top 10.

And on top of that I’d like to get the ones that are open source.

I use the query

query{
licenses{name}
}

to get a list of licences but I don’t know if this is an exhaustive list (seems like it's missing some licenses like MIT). According to the doc it is

Return a list of known open source licenses.

How to get an exhaustive lists of the licences and add it to my main query above to make my research more precise?

I can't seem to find clear answers as the documentation about the GraphQl api for GitHub is scarce and quite vague.

Thanks


Solution

  • I got an partial explanation from GitHub Support about the reason of why the results are inconsistent: it's due to the fact that there is a timeout when queries run for too long.

    Some queries are computationally expensive for our search infrastructure to execute. To keep search fast for everyone, we limit how long any individual query can run. In rare situations when a query exceeds the time limit, search returns all matches that were found prior to the timeout and informs you that a timeout occurred.

    Reaching a timeout does not necessarily mean that search results are incomplete. It just means that the query was discontinued before it searched through all possible data.

    Our team wrote about this here:

    https://help.github.com/articles/troubleshooting-search-queries/#potential-timeouts

    Given this reality, these timeouts may cause inconsistencies while paging through the results. We see how this could be improved in future iterations of search, so we've let our team know so they're aware though we can't make any promises on specific changes.

    Edit: Provided by the support, adding query: "language:javascript stars:>1600" (1600 is more or less the minimum star count of the top 3000 reps but need to be big enough to narrow the search) will provide consistently the top 10 repos ordered by star.