Multiple ways to identify a single resource in a RESTful API

Imagine an HTTP REST API for retrieving the list of commits for a Git repository:

/:repository/:branch/commits

I would like to extend the API to allow to get information about a single commit, either by its identifier (SHA) or by some other criteria, like the last commit by time. This can be achieved in two different ways by using path or query parameters.

It's common to use URL parameters for the main resource identifiers:

/:repository/:branch/commits/:sha

Now the URL for the latest commit can be

/:repository/:branch/commits/latest

This will lead to the last URL component being treated either as a special identifier or as a commit SHA. This will work, if "latest" is not a valid commit identifier, but doesn't feel right: the same URL component has different role based on its value.

Another alternative is to use a query parameter:

/:repository/:branch/commits?latest or (?latest=true)

In this case identifying a commit by a SHA or by a different criteria is cleanly separated. /commits API endpoint becomes asymmetrical however: the response has different structure depending on query parameters present, a single object if the latest commit is requested, an array otherwise (e.g. without filtering or filtering by an author). Returning a single-object array for the latest commit doesn't feel right either.

None of the two approaches results in a consistent API. Is there another way which doesn't have the same drawbacks? Are there some concerns I haven't considered making one design preferable over another?

Solution

Multiple ways to identify a single resource in a RESTful API

It would be wise, I think, to review the source material on resources, see Fielding, 2000

A resource is a conceptual mapping to a set of entities....

For example, the "authors' preferred version" of an academic paper is a mapping whose value changes over time, whereas a mapping to "the paper published in the proceedings of conference X" is static. These are two distinct resources, even if they both map to the same value at some point in time. The distinction is necessary so that both resources can be identified and referenced independently. A similar example from software engineering is the separate identification of a version-controlled source code file when referring to the "latest revision", "revision number 1.2.7", or "revision included with the Orange release."

Conceptually, there is nothing wrong with having more than one identifier for the same resource

GET /f8b0440e-1d65-4800-9e79-ef01183062da
GET /80871fe0-c414-4ec0-b1b3-2c3f0521e2ab

but general purpose components are not going to have any way to know which identifiers reference a common resource, and which reference distinct resources. As a consequence, from the client's perspective, each unique identifier implies a unique resource.

For example, invalidating a cached representation stored using one identifier does not also invalidate representations that use other identifiers.

This will lead to the last URL component being treated either as a special identifier or as a commit SHA. This will work, if "latest" is not a valid commit identifier, but doesn't feel right: the same URL component has different role based on its value.

There's nothing wrong with the fact that the same URL component has a different role based on its value. You just add logic to distinguish the role -- fundamentally, this logic isn't any different from what you already used to get the request to the correct handler. We parse the URI, and use the data we have found to branch to the correct handler.

The fact that the information is ambiguous, on the other hand, means that we are going to have a miserable time trying to ensure that we return the correct representation.

The answer, of course, is to add hints to the URI so that the meaning is no longer ambiguous. You can do that via path segments or by modifying the query as you prefer.

In this case identifying a commit by a SHA or by a different criteria is cleanly separated. /commits API endpoint becomes asymmetrical however: the response has different structure depending on query parameters present, a single object if the latest commit is requested, an array otherwise (e.g. without filtering or filtering by an author).

Yes, and so what? The client doesn't care - it is just asking for a representation of the resource with a given identifier; how you achieve that is up to you. The code to produce the representations is completely unaffected. The only piece you need to add is a bit of logic to determine how your endpoint delegates the work to be done.

But, for example, if your goal is to choose resource identifiers so that the logical complexity can be hidden within your general purpose routing code, you might consider something like

/:repository/:branch/commits
/:repository/:branch/commits/:sha
/:repository/:branch/latest

or something like

/:repository/:branch/commits
/:repository/:branch/commits/sha?:sha
/:repository/:branch/commits/latest

/:repository/:branch/commits
/:repository/:branch/commits/sha=:sha
/:repository/:branch/commits/latest

/:repository/:branch/commits
/:repository/:branch/commits/sha/:sha
/:repository/:branch/commits/latest

These are all fine; you choose the spellings that best fit the constraints of your context (for instance, if you are expecting to use HTML forms to access your resources, you are going to be more interested in designs that use key value pairs in the query part).