BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Hypermedia in RESTful applications

Posted by Mark Baker on Jan 28, 2008 |

Hypermedia WTF?!

Chances are that if you've heard of the REST architectural style, that you've also heard about what some consider it's most important constraint, the uniform interface, in particular the aspect of that interface that constrains the methods that can be invoked on resources. What you may not realize though, is that there's quite a bit more to the uniform interface than that. In particular, there's a sub-constraint that goes by the unwieldly name of "Hypermedia as the engine of application state", which is arguably the most important constraint of REST in the sense that it alone provides the bulk of the "shape" of RESTful systems as we know them.

In this article we'll do the deep dive on this constraint, trying to figure out what it means, and understanding its value.

The definition

Unfortunately, the REST dissertation doesn't expand much upon this constraint beyond providing its name plus a description of what it looks like in action;

The model application is therefore an engine that moves from one state to the next by examining and choosing from among the alternative state transitions in the current set of representations.

While that provides us a useful description, it doesn't, in my opinion, help us really understand the scope of the constraint itself; what exactly it allows, and therefore, what it disallows. To start then, it might be worth seeing what information we can extract from the name of the constraint itself.

"application state" refers to the state that determines "where" the user is in the process of completing a task. For example, when doing personal banking, is the user currently viewing account balances, filling in a bill payment form, or about to order new cheques? Those are each different application states. Some people mistakenly believe that "state" here refers to resource state, which would include, in this example, the balance of the accounts or the list of recent payments made. That isn't the case.

Application state is also known by the name "session state", and is the kind of state referred to by REST's "stateless" constraint, which requires that the client maintain it exclusively. In contrast, if you were using a remote session technology like VNC or Windows Remote Desktop, then the application state is kept entirely on the server.

The word "hypermedia" was coined by Ted Nelson in 1962 as a generalization of "hypertext", an invention of his. Whereas hypertext entailed interlinked textual documents, hypermedia expanded the scope to any form of media. The key with both, of course, is the embedding of links in the content we use.

The constraint in action

REST started getting some traction with developers working with Internet-facing services in 2003/2004 - at least with them actually associating their services with the moniker "REST" - most visibly with two high profile, self-described "REST APIs": Flickr, and Amazon. Interestingly, both services also offered parallel SOAP based interfaces, but neither saw much use compared to the "REST API". As a result, the REST community embraced these services and used them as a means to further explain the value and appeal of using the REST style on the Web. Unfortunately, there was a problem with these APIs: they weren't fully RESTful, as they were disregarding (at least) one of REST's constraints. In fact, there's a lot more to the problems with the Flickr API (as well as Amazon's, del.icio.us's, and several others) than we can cover here, so we'll just stick to those problems that relate to hypermedia.

Luckily, we need not look far for those problems. Take some sample data returned from the "flickr.contacts.getList" operation which a user can use to get a list of their own contacts;

<contacts page="1" pages="1" perpage="1000" total="3">
<contact nsid="12037949629@N01" username="Eric" iconserver="1"
realname="Eric Costello"
friend="1" family="0" ignored="1" />
<contact nsid="12037949631@N01" username="neb" iconserver="1"
realname="Ben Cerveny"
friend="0" family="0" ignored="0" />
<contact nsid="41578656547@N01" username="cal_abc" iconserver="1"
realname="Cal Henderson"
friend="1" family="1" ignored="0" />
</contacts>

There, the "nsid" attribute contains a value which is a unique identifier for individual contacts, in this case three of them. But once a client has retrieved that document, then what? What if they want to know more about Cal Henderson? A quick check of the Flickr API documentation reveals an operation called "flickr.people.getInfo" which takes an nsid as an argument, and returns more information about the person identified by that nsid string. So the required URI which we can use in an HTTP GET message to find out more about Cal would be;

http://api.flickr.com/services/rest/?method=flickr.people.getInfo?auth_key=xxxx&user_id=41578656547@N01 

This is not hypermedia. A hypermedia solution would have used standardized identifiers - URIs, for the Web - instead of proprietary ones, thereby avoiding the need for Flickr-proprietary knowledge for a client to go from a document with a list of people, to a document about one of those people. If standardized identifiers were used, then that first document would look something like this;

<contacts page="1" pages="1" perpage="1000" total="3">
<contact nsid="http://api.flickr.com/services/rest/?method=flickr.people.getInfo?auth_key=xxxx&user_id=12037949629@N01" username="Eric" iconserver="1"
realname="Eric Costello"
friend="1" family="0" ignored="1" />
<contact nsid="http://api.flickr.com/services/rest/?method=flickr.people.getInfo?auth_key=xxxx&user_id=12037949631@N01" username="neb" iconserver="1"
realname="Ben Cerveny"
friend="0" family="0" ignored="0" />
<contact nsid="http://api.flickr.com/services/rest/?method=flickr.people.getInfo?auth_key=xxxx&user_id=41578656547@N01" username="cal_abc" iconserver="1"
realname="Cal Henderson"
friend="1" family="1" ignored="0" />
</contacts>

Ok, but what would making this change to hypermedia gain them or their users?

Flickr's current approach of requiring that clients possess Flickr-specific knowledge in order to progress from one application state to another, is simply another way of saying that they have a proprietary application model. Not only is it proprietary though, but it's not even a consistent model within the Flickr API itself, as the knowledge needed to go from a list of contacts to information about one of the contacts (as above), is different than the knowledge needed to go from a contact to that contact's list of photos (which is "flickr.photos.getContactsPublicPhotos". This presents evolvability problems for Flickr, as even simple extensions to the API can easily require new knowledge to be disseminated, in turn requiring changes to client code. A generic client such as a search engine would be unable to index Flickr content via this API, as I'm sure that the maintainers of the search engine would have little interest in upgrading their software each time Flickr - or anybody else using such an application model - extended their API. Again, this isn't specific to hypermedia: any standardized application model would provide the same benefits. Of course, the hypermedia model has proven itself quite popular, even if those using it didn't realize that's what they were doing.

So by using a common application model, one that is not just standardized, but fixed for all time, you are reducing coupling between consumer and producer by permitting each to evolve independently of the other. This way, old and new services can be combined together into a composite application, and old and new clients can be the ones doing that combining. I suppose we take it for granted in our use of the Web that one can simply include a link in a document to a page authored years ago and a consumer of that content can seamlessly navigate between them without having to download a new version of the browser. That's by design, not by accident.

It should also be noted that the Web by no means has a monopoly on the use of hypermedia. Another pervasive application we all use daily on the Internet does too; email. Every email message includes headers which carry email addresses for the sender and the recipients, and possessing one or more of those is sufficient information with which to send another email message.

While we're discussing little known facts, you might also be interested to know that an important aspect of the Web itself does not use hypermedia; robots.txt, aka robot exclusion. How it works is that sites that wish to exclude search engines from indexing some of their content simply place a file at their "/robots.txt" URI which describes what shouldn't be indexed. The thing is though, that as far as I know, ''hardly anybody links to a robots.txt file''. And why should they? "/robots.txt" is a fixed and well known location, especially to search engines: give any URI, they can construct from it the corresponding robots.txt URI for that domain. This is not hypermedia though, because the link isn't dynamically discovered in another page, it's known a priori by the search engine. That's not to say this is a bad solution, because the hypermedia approach would have required two network round-trips (one to discover a page which linked to robots.txt, and another to grab it), which would be a burden to all parties. So there you have a good example of the cost of hypermedia. Keep in mind though, that there are few cases where this is really the best approach. Sitemaps might be one for the same reasons as robots.txt, but others such as "favicon.ico" and Apple's new iPhone WebClip feature would likely have benefitted somewhat from the use of hypermedia; for example, those icons would be picked up by an image search engine without updating the search engine software.

Another technology that deserves some attention when considering hypermedia is WADL, the Web Application Description Language. Though a self-described "RESTful description language", there's an important caveat. Consider this example snippet from a WADL file:

<resources base="http://service.example.com/myservices/">
<resource path="search">
<method name="GET" id="search">
<request>
<param name="query" type="xsd:string" style="query" required="true"/>
</request>
<response>
<representation mediaType="application/xml" element="yn:ResultSet"/>
<fault status="400" mediaType="application/xml" element="ya:Error"/>
</response>
</method>
</resource>
</resources>

The file declares a "search" resource as part of a collection of "myservices". It describes, via the declaration of use of HTTP GET and of the "query" parameter, how a client can construct a URI given an input string of its choosing.

On the face of it, this appears a perfectly RESTful, hypermedia based solution, very similar to how an HTML form (or URI Templates) is used. So what's the caveat? It's the issue of when the WADL is consumed. Some Web services proponents who have taken an interest in Web based solution have been using WADL as they would use WSDL, as a design-time artifact. Using WADL this way though, is akin to developing a Web browser with built-in knowledge of, say, the Google homepage HTML form at the time the browser was compiled: if Google changes the form in a backwards-incompatible way (i.e. not just adding a new optional parameter) after the browser is deployed, then the browser will not be able to use that resource/service. Using the hypermedia constraint with WADL means that the client should consume the WADL at runtime. So be careful when choosing your WADL tooling, as some tools that try to help you, don't.

Conclusion

Hopefully the value of the hypermedia constraint is a little more apparent now than it might have been. More than that though, what I really hope is that you're better able to know what practices you need to avoid when you've decided to use it.

Keep in mind that hypermedia is one of the uniform interface constraints, so the latter's more general litmus test applies: if you're developing client-side code which makes assumptions which aren't true of all resources (or server-side "APIs" which requires this of clients), then you're not using the uniform interface.

About the author

Mark Baker is well-known in the SOA and Web services community because of his continuous efforts to promote an architectural style called REST (REpresentational State Transfer), criticizing many of the standards and specifications as being ignorant of what made and continues to make the Web successful.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Bad example by Steve Jones

Yes you can download a page from the link, but this doesn't mean you will be able to read it. After all the "page" could be a ".doc" file or an old Word Perfect file. Even if its modern HTML with CSS then there is little chance of it being legible on an old NCSA Mosaic browser.

Just because I can link to something doesn't mean its usable. The assumption above appears to be that the "hyper" is more important than the "media".

Re: Bad example by Stefan Tilkov

True. But I can handle anything that has a media type I know (and have implemented logic for), which makes data and operations orthogonal concerns.

Re: Bad example by Steve Jones

This isn't quite true though either. For instance you might think you can handle video, but then find out you don't have the right codec. This is the same as the Mosaic "I can handle HTML" problem.

Basically what you are saying is that in order to handle something you have to know about it before hand. I completely agree, but that isn't a run-time thing.

Resource States by Jean-Jacques Dubray

Mark:

>> Some people mistakenly believe that "state" here refers to resource state, which would include, in this example, the balance of the accounts or the list of recent payments made. That isn't the case.

I am sorry, but you seem to not understand the difference between the "content" of a resource and its "state". The examples such as "account balance" and list of transactions are "content" not states.

States of an account include "opened", "blocked", "closed"... They have nothing to do with application state, even though the actions that are allowed on the account depend on the state of the account.

I am disappointed by your article, precisely because you are not talking about the relationship between application state (i.e. an engine that moves from one state to the next by examining and choosing from among the alternative state transitions in the current set of representations) and the state of the underlying resources.

This is a much more difficult problem to deal with when compared to the problem of encoding "Query-by-example" behind a URI. QBEs are by no means are related to application "state" (i.e. the actions a user my take one a resource) but simply information navigation. You don't perform work when you navigate. QBEs are well addressed by REST and the Web. As you pointed out one can model the access to all the resources of my application with URI encoded QBEs. The biggest benefits include: they may become searchable and navigation links can be added to any of them without particular knowledge of their overall interface.

You also did not speak about "events". There is no event model in REST. I have read recently someone claiming that it's okay to "GET the resource representation until something changes". BTW, this is what search engine do, they roam the Web for changes. Not very efficient, if you ask me.

Since, when a resource changes "state", there is no practical way for a resource representation consumer to know about it. This reduces the application of REST to very short lived interactions when the resource of the state is not likely to change (and therefore the actions that are enabled don't change either).

At the end of the day, the interface of a resource includes:
- QBEs
- Inter-Actions (to transition from state to another)
- Events

REST deals only with the first one, not the other two.

JJ-

Re: Resource States by Mark Baker

JJ, the subject of the article is hypermedia, not on the relationship of application & resource state, nor "events", so please don't fault me for staying on topic! 8-O

Also, the word "state" is well-defined in computer science and encompasses all the uses I've made of it in the article, including your different states of an account.

I do have to disagree with you when you say "QBEs are by no means are related to application "state"", because with REST they are exactly that as a "query" is just a GET of a parameterized URI that was constructed based on directions previously provided by the server (as a result of a prior GET). Other architectural styles are free to do query differently of course.

Re: Resource States by Jean-Jacques Dubray

Ok, fair enough.

thanks,

JJ-

Re: Bad example by Patrick Logan

The fewer choices you have and the more your system knows about the choices it does have, the more usable this feature. Yes?

I think the point is fair -- this is not Big AI. This is making the most out of what you've got at hand. The alternative is a lot more special purpose, one-off coding.

Re: Resource States by Stefan Tilkov

JJ, it seems your definition of "resource" is a different one than Mark's (or the REST dissertation's). The distinction of "content" and "state" is simply not part of this definition, which I don't see as a shortcoming, just as a difference.

Regarding the use of polling to do eventing, I have to say that I disliked it at first, having been used to things such as JMS pub/sub, CORBA Event service, etc. But in conjunction with HTTP's caching/conditional GET support, querying for changes scales massively and is a great solution if you can live with the latency. For anything but high-performance/soft realtime scenarios, I know prefer to simply publish Atom feeds because of the simplicity and scalability.

Re: Resource States by Jean-Jacques Dubray

thanks, this is an important information because as you mentioned it is not intuitive that it would scale.

JJ-

Re: Bad example by Steve Jones

Is it more coding? Or is it actually the same amount of coding just done in another way? You don't actually have fewer choices just because you have a reduced vocabulary, that can just mean its more complex to do the same thing because it takes more interactions.

If we are talking about dynamically changing complex systems, such as VMI or Financial Trading then the phrase "big AI" is probably exactly what we are talking about.

Useful media types by Mike Kelly

This may be of interest to readers:

I've written a media type called HAL that can be used for APIs similar to the way HTML is used for web apps, read more here:

stateless.co/hal_specification.html

The JSON variant is now in the process of being turned into an Internet-Draft, and the latest version is available here:

raw.github.com/mikekelly/hal-rfc/master/draft-k...

Hopefully it's of use to someone!

Cheers,
M

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

11 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT