Search code examples
jsonresthateoashypermediadiscoverability

HAL - is it a violation to the HAL format/standard if links are in the main body?


According to the HAL standard (see here and here) the links to other resources should be placed in a specific embedded section.

So for instance this is not valid HAL, is my understanding correct?

{
  "movies": [
     {
       "id": "123",
       "title": "Movie title 1",
       "_links": {
           "subtitles": {
              "href": "/movies/123/subtitles"
           }
        }
     },{
       "id": "456",
       "title": "Movie title 2",
       "_links": {
           "subtitles": {
              "href": "/movies/456/subtitles"
           }
        }
     }
  ],
  "_links": {
      "self": {
         "href": "/movies"
      }
   }
}

The reason for which the above JSON is not valid HAL is that the links should be placed in an embedded section ("_embedded") that links to the ID in the main body. So the right approach would be:

   {
      "movies": [
         {
           "id": "123",
           "title": "Movie title 1",
         },{
           "id": "456",
           "title": "Movie title 2",
         }
      ],
      "_embedded": {
          "movies": [
             {
               "id": "123",
               "_links": {
                 "href": "movies/123/subtitles"
               }
             },
             {
               "id": "456",
               "_links": {
                 "href": "movies/456/subtitles"
               }
             }
          ]
      }
      "_links": {
          "self": {
             "href": "/movies"
          }
       }
    }

Is all the above correct?

Thanks


Solution

  • Gonna use this as a case study on re desiging with hal. @darrel millers answer is good, but not great and has some things i think should be cleared up. this is gonna be LONG.

    The big question is what is the context...IE what is this resource you are returning. All you've got is something that relates to movies somehow. Like suppose this is a search result...having a movie relationship is the wrong approach..as the movie relates to the search result "top level resource" as an item. so it should be something more like

    {
       title : "Results for search XXXXX"
       _links : {
          self : { href: "https://host.com/search/with/params/XXXXX"},
          item : [
             { href : "https://host.com/url/to/movie/result/one", title : "A great Movie"},
             { href : "https://host.com/url/to/movie/result/two",  title : "A Terrible Movie"},
          ]
    
       }  
    }
    

    but this structure would be expensive for a client to construct a UI around as it would have to make 3 calls..following the N+1 rule (1 for the result set..then N for each result) so thus was born _embedded which is just hal implementation of the hypertext pre fetch pattern (in http2 the server could actually send each result as it's own document and the client's cache would be pre-filled with those results and you wouldn't necessarily need _embedded). That structure looks more like this:

    {
       title : "Results for search XXXXX"
       _links : {
          self : { href: "https://host.com/search/with/params/XXXXX"},
          item : [
             { href : "https://host.com/url/to/movie/result/one", title : "A great Movie"},
             { href : "https://host.com/url/to/movie/result/two",  title : "A Terrible Movie"},
          ]
    
       },
       _embedded : {
         item : [
           {
             _links : {
               profile : {href : "https://host.com/result-movie"},
               canonical : {href : "https://host.com/url/to/movie/result/one"}
             },
             title : "a great movie",
             rating : "PG",
    
           },
           {
             _links : {
               profile : {href : "https://host.com/result-movie"},
               canonical : {href : "https://host.com/url/to/movie/result/two"}
             },
             title : "a terrbile movie"
             rating : "G",
           }
         ]
       }
    
    }
    

    That's pretty great 1 http request to get 3 resources. 1 request not N+1. Thank you HAL!

    So why item? well does the search result ONLY EVER contain movies...that's very unlikely..and even if it does today...do you want it to only ever contain movies tomorrow...that's pretty narrow and this structure is a contract that you have to maintain basically forever. But your UI really wants to show the result as a movie. That what the profile link i've added is for...the client uses the profile link to know what the resource it's currently processing is..and what fields it can use to build up a UI. A decent client when processing a collection displays what profiles it can..and just ignores what ones it can't (logging a warning maybe). It up to the client dev to upgrade their app to support new profiles...don't believe me? think about how a web browser parses tags it doesn't understand in html...put a <thing-not-invented-yet></think-not-invented-yet> in your html doc and see how a good client works.

    another thing you should notice is i do NOT use self links but canonical..i've changed my stance on this over the years. as of late i default to canonical and i ONLY use self when I'm maintaining versions of the target resource and it's important to have the embedded object link to the specific version that was embedded. this is very rare in my experience. I tell clients to follow self it it's present, ortherwise follow canonical when navigating to the embedded item. this gives the server complete control on where it wants to take the client. The top level resource, in this case a result should still have a self...and in this case it makes sense as soem random search probably does not have a canonical link...unless it's a VERY common keyword search...then it probably should as other users could use the same url.

    Let take a moment to talk about why item as the rel...cause this is really important. Since it's a search result..why not have the rel be result. There's a really easy answer..result is not a member of the IANA link registry https://www.iana.org/assignments/link-relations/link-relations.xhtml and therefore result is completely invalid...now you could "namespace" your extension rel with my:result or our:result (the "namespace" is up to you, these are just example) but why bother with that if a perfectly good one already exists in the IANA registry..and it does item.

    Let's talk about items vs item (or x:movies vs x:movie) . Well items isn't in IANA either..so it'd have to be x:items but instead of doing that let's think about why. If our result doc was represented in HTML it'd be looking like this (ignore my missing body head etc not well-formedness for brevity):

    <html>
      <title>Results for search XXXXX</title>
      <a rel="item" href="https://host.com/url/to/movie/result/one" >A Great Movie</a>
      <a rel="item" href="https://host.com/url/to/movie/result/two" >A Great Movie</a>
    </html>
    

    This is the SAME resource as the first example (without embedding sub resources). Just represented as text/html and not application/hal+json. If i've lost you here (this is where most people get REALLY confused, the best i can offer is to watch my talk on this at https://www.youtube.com/watch?v=u_pZBBELeEQ ) Hear it's clear the appropriate relationship of each target resource is a SINGLE ITEM and not a set of ITEMS. each link targets one item (or one, singular movie).

    There's a trap with HAL to treat it like JSON and that leads to statements like the one the comments that movies is machine readable or better. Let me explain how this comes about by continuing with this HTML representation in a use case. When a client parses this document looking for item links it must parse EVERY a tag and filter down to only those where rel="item" attribute is present. That's a "full table scan"..and how do we get away from those? we create an index. JSON has the concept of an index built into it's structure. It's a key with an array value. index : [ {entry 1}, {entry 2} ]. The author of HAL knew the most common way to retrieve links (in _links or the prefetched ones in _embedded) would be by relationship..so he structured his spec such that rel is indexed. so when you see:

       _links : {
          self : { href: "https://host.com/search/with/params/XXXXX"},
          item : [
             { href : "https://host.com/url/to/movie/result/one", title : "A great Movie"},
             { href : "https://host.com/url/to/movie/result/two",  title : "A Terrible Movie"},
          ]
    
       },
    

    know that it is REALLY

       _links : {
          self : { rel: "self", href: "https://host.com/search/with/params/XXXXX"},
          item : [
             { rel:"item", href : "https://host.com/url/to/movie/result/one", title : "A great Movie"},
             { rel:"item", href : "https://host.com/url/to/movie/result/two",  title : "A Terrible Movie"},
          ]
    
       },
    

    because the rel is an attribute of the LINK OBJECT and NOT THE RESOURCE. but bytes over http are expensive (gzip would get rid of this) and devs don't like redundancies (a whole other topic) so when we have hal we OMIT the rel attribute since the HAL structure already makes the rel apparent. though it's not really apparent when your parser encounters just this:

    { href : "https://host.com/url/to/movie/result/one", title : "A great Movie"}
    

    what's the rel? you have to pass that in from the parent node..that's always been ugly...anyways all this is to show that redundancy is eliminated in HAL generally. once this redundancy is eliminated it's tempting to change that index key to the plural form items but know that would mean you are saying your link (once redundancies are PUT BACK) would be {rel: "items", href : "https://host.com/url/to/movie/result/one", title : "A great Movie"} and that is clearly wrong..that link is not to many items...just one.

    So removing redundancy in this case probably wasn't the best..but it's evil with benefits and HAL follows that pattern for _links and _embedded and that's what we're gonna do with our search result..given that ALL the item links have no been pre-fetched and are present as _embedded it's unimportant to keep them in _links. and as such it should look like this:

    {
       title : "Results for search XXXXX"
       _links : {
          self : { href: "https://host.com/search/with/params/XXXXX"}
       },
       _embedded : {
         item : [
           {
             _links : {
               profile : {href : "https://host.com/result-movie"},
               canonical : {href : "https://host.com/url/to/movie/result/one"}
             },
             title : "a great movie",
             rating : "PG",
    
           },
           {
             _links : {
               profile : {href : "https://host.com/result-movie"},
               canonical : {href : "https://host.com/url/to/movie/result/two"}
             },
             title : "a terrbile movie"
             rating : "G",
           }
         ]
       }
    
    }
    

    Now we have a pretty good search result that includes 2 movies (and can include more things in the future without breaking the contract). Note: if you ever went live with JUST _links and no _embedded...you can NOT remove the _links as some client out there is depending on them being present..so it's best to think of this stuff early...thought a well behaving client should always check _embedded before _links when using the HAL representation of a resource...so it's really up to you to know if all your clients are well behaving.

    Ok so let's move to a case where x:movie is the correct relationship..that probably would be good if the top level resource is an actor. so something like:

    {
       Name : "Paul Bettany"
       _links : {
          canonical : { href: "https://host.com/paul-bettany"},
          "x:movie" : [
             { href : "https://host.com/url/to/movie/result/one", title : "A great Movie"},
             { href : "https://host.com/url/to/movie/result/two",  title : "A Terrible Movie"},
          ],
          "x:spouse" : { href: "", title: "Jennifer Connely"}
       },
       _embedded : {
         "x:movie" : [
           {
             _links : {
               profile : {href : "https://host.com/result-movie"},
               canonical : {href : "https://host.com/url/to/movie/result/one"}
             },
             title : "a great movie",
             rating : "PG",
    
           },
           {
             _links : {
               profile : {href : "https://host.com/result-movie"},
               canonical : {href : "https://host.com/url/to/movie/result/two"}
             },
             title : "a terrbile movie"
             rating : "G",
           }
         ]
       }
    
    }
    

    Notes: i used canoncial instead of self at the top level because an actor is long-lived resource..that actor will always exist..and an actor is not versioned. For completeness i left both x:movie in _links and _embedded, however in practice i would NOT have the ones in _item. I also kept them in _links to show the reasons to have x:movie is so that you can differentiate it from x:spouse (that semantic differentiation did NOT make sense in the search result case we started with). Finally it's useful to note that i embedded x:movie but NOT x:spouse this is just to illustrate that it is not an either / or thing. you can pre-fetch/embed the link you need for your use case. In fact i often embed things based on the identity of the client..ie i know iOS can display something that android can not.

    Those notes aside, the reason i went here is that i wanted to make it clear that you do NOT and SHOULD NOT have that movies: data field that you have...just rely on the movie data in _embedded. You said soemthign like matching up the values in the movies to teh ones in _links or _embedded...you should NOT be doing that..that doesn't make any sense. a movie is a resource...use the linked resource of a movie not some data field. You need to decide early on what is a resource and what is a piece of data. my best tip is if a thing has link relationships..then it's a resource. In my talk i go into MUCH MORE DETAIL on this with broader terms (hypermedia controls) that i don't want to get into here yet.

    A final note..in hypermedia applications you KNOW you are doing something wrong if you are exposing internal id fields..as you have done here. That should be a huge red flag that something is wrong. The use case for the id's you described was to match up the data field movies with the _embedded x:movie. As stated...you should NOT be doing that..and the presence of an id field should key you in to that bad practice.

    I was asked to answer here..so i hope this helps.