How to address ambiguous 404s when designing a RESTful API

I've come across this curious scenario while writing tests + documentation for a REST API I am developing. According to this REST tutorial, a key abstraction to exploit in a RESTful API is the concept of a resource, and a common pattern is to have resources which themselves contain resources of their own. Additionally, returning 404 for an ID'd resource that does not actually exist is just as much of a common pattern.

My questions comes from the fact that a 404 response code can be ambiguous considering the hierarchical nature of a REST API.

For example, assume the data layer our REST API interacts with has the following data:

{
  "users": {
    "foo": {
      "notes": {
        "hello": "world"
      }
    }
  }
}

Calls to our REST API that return 200 imply that all resources in the path exist:

GET /users/foo returns 200 because the user foo exists.
GET /users/foo/notes returns 200 for the same reason.
GET /users/foo/notes/hello returns 200 because both the user foo and a note named hello belonging to foo both exist.

There are even expected 404 response codes for particular paths:

GET /users/bar returns 404. That is nonambiguous since the 404 only refers to one resource.
GET /users/bar/notes returns 404. This is just as unambiguous (assuming the API does not return 404 for nonexistent paths).

But consider that the following return 404 for different and ambiguous reasons:

GET /users/bar/notes/baz returns 404 because the user bar does not exists.
GET /users/foo/notes/baz returns 404 because the existing user foo does not have a baz note.

In short, the 404s returned do not inform the client what exactly failed to be found: the user or the note. So my question is as follows:

Is it the responsibility of the server to be nonambiguous with 404 response codes? And if so, how should it differentiate to the client the nonexistence of a user versus the nonexistence of a user's note?

Solution

Is it the responsibility of the server to be nonambiguous with 404 response codes? And if so, how should it differentiate to the client the nonexistence of a user versus the nonexistence of a user's note?

By providing a "a representation containing an explanation of the error situation, and whether it is a temporary or permanent condition" as described in RFC 7231.

In other words, put the explanatory details into the document that you include in the HTTP response.

It may help to think more carefully about how all this works with web pages.

The status code is metadata in the transfer of documents over a network domain. The intended audience for that information is the web browser (and other general purpose components - spiders, caches, and so on). It's provided so that your browser (and other general purpose components) can correctly interpret the semantics of the response.

The audience for the "representation of the error" is the human being using the web browser. That's the place where one would provide, for example, information about what specifically has gone wrong, or what corrective actions might be taken.

In modern days, it is often the case that we are expecting bespoke machine clients, rather than humans, to be looking at the "web browser". Free form text or free form text marked up with hypermedia controls aren't likely to be useful. So we probably want to use problem details - a standardized schema for reporting problems.

One difficulty you may be having (not your fault; the literature sucks) is recognizing that identifiers are semantically opaque. /users/foo/notes/baz does not, generally, have any dependency on /users/foo/notes or any of the other prefixes. Nor does the identifier mean that /users/foo/notes/baz has four different parts that need to be satisfied.

Identifiers should be understood like keys into a map/dictionary - 200 means that the key exists in the map, 404 means the key doesn't exist in the map. But that doesn't actually tell you anything about the presence or absence of other keys with similar spellings!

Is your API, which conventionally organizes its resource model into a hierarchy, and chooses identifiers that are closely aligned with that hierarchy, "better" than an API that uses an unconventional resource model and arbitrary identifiers? Probably.

But good resource models and good identifier spelling conventions are not a REST constraint, and the HTTP and URI specifications also support designs that don't follow the current conventions (among other things, backwards compatibility is really important to REST and the web; REST and the web predate these spelling conventions by quite a bit).

(Analogy: we have coding conventions that describe "best practices" around ideas like variable naming and function naming because we use languages that don't restrict us to using "good" names. The machines don't care.)