Github GraphQL v4 API nested pagination (Multiple pagination cursors can not be followed in a single query)

Let's paint a hypothetical picture for discussion.

Let's say a large company has 200 organizations each with 250 repositories and each of those repositories has 300 contributors.

Let's say I would like to build up a GraphQL query that answers the question:

Give me all contributors (and their privileges) of all repositories of all organizations in my account.

Obviously, pagination is needed.

But the way it is currently implemented, a pagination cursor is provided for each list of contributors, each list of repositories, and each list of organizations.

As a result, it is not possible to complete the query by following a single pagination cursor.

It is not clear to me that the query can be completed at all due to the ambiguity of specifying a pagination cursor for one list of contributors for one org/repo combo versus the next org/repo combo.

Thanks

Solution

Your initial query structure looks something like this (simplified):

query {
  organizations(first: 10) {
    repositories(first: 20) {
      contributors(first: 30) {
        name,
        privileges
      }
    }
  }
}

Now imagine this query would return a single pagination cursor. What should the next page look like?

next 10 organizations (with first 20 repositories, with first 30 contributors)
same 10 organizations, but next 20 repositories (with first 30 contributors)
same 10 organizations, with the same 20 repositories, but next 30 contributors
some wild mix of the above

When you build your own GraphQL API, you can design your cursor pagination according to your needs. But the GitHub API has to serve a wide range of consumers, and they chose a very flexible schema design, that enables the clients to fetch exactly the data they need, without overfetching. But in some cases it may take additional roundtrips to get all the data you need.

Let's look at this from a frontend perspective:

After the initial request you will display the first 10 orgs, and for each org the first 20 repos, and for each repo the first 30 contributors.

Now the user can decide of which data he wants more:

either load more orgs, or
load more repos for a specific org, or
load more contributors for a specific repo

Each of these decisions will result in a simple paginated query with one of the cursors the GitHub API provided. No need for an all-mighty pagination cursor.

(I highly doubt, that there's a UI/UX use case where you want to paginate everything at once)

Though in this case I'd say that the GitHub API is perfectly suited as it is. In my opinion it's not reasonable to display 200 * 250 * 300 = 15000000 contributors at once, because from a user's perspective that's just way too much.

Let's look at this from a backend perspective:

If you want to gather the data you described for analysis, aggregation or something similar on your backend server, and you already know that you need all the data, you may be able to skip pagination entirely by providing a large number for first. (may not work for GitHub's API - as far as I know they are limited to max. 100 entries per pagination).

Even if you are forced to use pagination, you are able to cache the results. Of course it still takes a few hundred roundtrips to the GitHub API, but this can be a scheduled job that runs once every night.

And because at this point you've already written all the necessary code, it's easy to implement some kind of partial refresh. For example if you know that "repo 42 of org 13" is pretty active, you're able to just refetch the data for this specific repo (on demand or in a shorter interval) and update your cache.

I don't know your specific use case, but as long as you don't need (nearly) live updates of this huge data set, I'd say that GitHub's API is sufficient and flexible enough for most people's requirements.