BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Decoupling REST URLs from Code using NetKernel Grammars

Posted by Randolph Kahle on Dec 15, 2009 |

Accessing data and services via the World Wide Web and its HTTP protocol is challenging. There have been many attempts to leverage the Web and HTTP through various designs aimed at offering efficient, concise and versionable systems - most under the umbrella of Service-Oriented Architecture. An approach that has gained a lot of attention recently, REST, relies on a URL to identify services and information. However, the Web is a dynamic, constantly changing information environment with new content and URLs being added all the time, whereas implementation code (particularly code which has been deployed and is now publicly accessible) is more difficult to change without causing problems for developers, system administrators and users. What is needed is a mechanism that can fit between the potentially fluid world of URLs and the more static world of compiled and deployed code. Such a mechanism must provide a binding between the URLs and service implementation code as well as be able to buffer and isolate the changes in the former from the code.

In software, a formal grammar is used to define the syntactic structure of textual information, such as a program, data file, or URI identifier. Programs use grammars to direct them to recognize when textual information adheres to a defined syntax as well as to parse the textual information. Programs can also use grammars to generate text that adheres to the syntactic rules. The following diagram illustrates a Recognizer/Parser program using a supplied grammar to parse the string "Part1Part2Part3" and assign the parts ("Part1", "Part2" and "Part3") to three variables.

While developing NetKernel 4 [1] we realized that a grammar based recognition and parsing technology could be used to process request identifiers and simplify software development on the platform. The NetKernel 4.0 Grammar technology [2] is a bi-directional mapping mechanism that implements this idea; it will both parse an identifier into parts and build an identifier from supplied parts. The NetKernel 4 grammar technology can be leveraged when implementing REST web services to perform the function of recognizing and binding web service identifiers to web service implementation code.

Outside -> In

To start, we will look at the use of a grammar based parser to handle the information coming from the outside, in the form of the REST web service identifier, and convert the parsed identifier text into values associated with internal named arguments. In our example we will use one of the Twitter REST web service APIs [3], which has the following general form:

  http://www.twitter.com/statuses/user_timeline/{user-id}.{representation-type}

This diagram illustrates the use of a grammar driven parser to recognize the Twitter web service identifier, parse the user identification and representation information, and assign that information to the named arguments representationType and twitterID.

The following NetKernel grammar will recognize this set of identifiers:

  <grammar>
    http://www.twitter.com/statuses/user_timeline/
    <group name="twitterID"><regex type="alphanum"/></group>
    .
    <group name="representationType"><regex>(xml|json)</regex></group>
  </grammar>

The grammar includes fixed text ("http://..." and ".") as well as two groups. Each group defines a section of the identifier that is to be recognized using a regular expression. Because each group has a name attribute, the grammar engine will assign the parsed text portion of the identifier to the specified named argument. For example, the second group will recognize either a trailing "xml" or "json" and assign that value to the named argument representationType.

The following table illustrates how the grammar directs the parsing of example identifiers

URI twitterID representationType

http://www.twitter.com/statuses/
user_timeline/demo1060.xml

demo1060 xml
http://www.twitter.com/statuses/
user_timeline/pjr1060.json
pjr1060 json

In NetKernel, an endpoint is declared with a grammar and its Java implementation class. In our example, the following endpoint declaration will cause NetKernel to associate the Twitter grammar with an instance of the Java class org.ten60.demo.grammar.UserTimelineAccessor.

  <endpoint>
    <grammar>
      http://www.twitter.com/statuses/user_timeline/
      <group name="twitterID"><regex type="alphanum"/></group>
      .
      <group name="representationType"><regex>(xml|json)</regex></group>
    </grammar>
    <class>org.ten60.demo.grammar.UserTimelineAccessor<class>
  </endpoint>

When an identifier is presented to the endpoint, the endpoint delegates to the grammar engine the job of recognizing and parsing the identifier and assigning portions of the identifier text to twitterID and representationType. Those values are available to the UserTimelineAccessor instance through the context argument of the onSource(...) method. The following Java code [4] is the implementation of the endpoint functioning as a reflection service [5], simply returning the information provided in the identifier:

  package org.ten60.demo.grammar;

  import org.netkernel.layer0.nkf.INKFRequestContext;
  import org.netkernel.module.standard.endpoint.StandardAccessorImpl;

  public class UserTimelineAccessor extends StandardAccessorImpl
  {
    public void onSource(INKFRequestContext context) throws Exception
    { // Request the portion of the identifier that provides the Twitter ID
      String userID = context.getThisRequest().getArgumentValue("twitterID");
      
      // Request the portion of the identifier that provide the representation type
      String repType = context.getThisRequest().getArgumentValue("representationType");

      // Return a representation that simply reflects the information parsed from the identifier
      context.createResponseFrom("Request made for [" + userID + "] with type [" + repType +"]");
    }
  }

Note that the compiled Java code is de-coupled from the structural form of the identifier. If the identifier for the service changes, a different grammar could be used to map the new identifier structure to the existing code. For example, let's say that the Twitter service introduces a version 2.0 API that provides a new way to request existing services. If the new API 2.0 URL has the form

  http://www.twitter.com/2.0/user/timeline/status/{titter-id}.{representation-type}

Then the new API can be mapped to the existing Java class with the following endpoint declaration:

<endpoint>
  <grammar>
    http://www.twitter.com/2.0/user/timeline/status/
    <group name="userID"><regex type="alphanum"/></group>
    .
    <group name="type"><regex>(xml|json)</regex></group>
  </grammar>
  <class>org.netkernel.UserTimelineAccessor<class>
</endpoint>

In NetKernel both endpoints can exist simultaneously and use the same implementation class.

Inside -> Out

Now, let's switch this around. Instead of processing requests from the outside, let's use a grammar to create requests inside our code that will allow us to access an outside service. We again use the Twitter service as our example. To create a request to the Twitter service we first define an endpoint that specifies the Twitter grammar:

  <endpoint>
    <id>twitter:endpoint:status</id>
    <grammar>http://twitter.com/statuses/user_timeline/
      <group name="twitterID"><regex type="alphanum"/></group>
      .
      <group name="representationType"><regex>(xml|json)</regex></group>
    </grammar>
    <request>
      <identifier>res:/foo</identifier>
    </request>
  </endpoint>

The important parts of this endpoint are the id and grammar elements (the request element must be specified but is not used in our example). The grammar element specifies the Twitter grammar that we saw earlier. The id element defines an endpoint identifier that we use in our code to retrieve the grammar. To see how this is done, look at the following code fragment from a NetKernel endpoint implementation:

  String repType = "json";
  String userID =  "pjr1060";
  
  // Create a request that retrieves and binds to the Twitter grammar
  INKFRequest request = context.createRequestToEndpoint("twitter:endpoint:status");

  // Transfer local variable values to the named arguments in the Twitter grammar
  request.addArgument("twitterID", userID);
  request.addArgument("representationType", repType);
  
  // Now we can issue a request to Twitter by issuing the constructed request

  // Issue request to Twitter and capture the response
  INKFResponseReadOnly response = context.issueRequestForResponse(request);

  // Return the response from the external service as our response
  context.createResponseFrom(response);

The following diagram illustrates the request object being bound to the Twitter grammar and constructing an identifier from the supplied parts.

Deep Inside

The concept of using a grammar to parse and build identifiers can be taken to the logical extreme deep within software to decouple a requestor and implementor through an associated identifier. In fact, this is exactly how NetKernel works. It borrows the idea of logical / physical decoupling from the Web and moves it inside software. Within a NetKernel system all functions are just like REST web service calls. For example, instead of making a direct API call to an XSLT processing engine, a request is made for the XSLT service using an identifier such as:

  active:xslt+operator@res:/style.xsl+operand@res:/data.xml

This URI uses the active URI scheme [6] and includes the service name, xslt, and two named arguments operator and operand.

Why do this? Well, the Web is malleable, but physical code is harder to change; if we introduce web-like identifiers for resources and services within our software, then our software systems can take on the properties of the Web.

Nice idea, but any reasonably experienced developer will say that the performance will be ... *$"%^$# ! That is a valid concern, but it misses one of the important properties of the Web - the ability to cache representations. Because real-world systems tend to follow statistical distributions, a relatively small cache of already computed values can dramatically increase overall performance. The tricky part is - for any given system, which values do you cache? This is almost impossible to predict for hand-coded memoization. NetKernel's cache [7] takes a system-wide view and balances itself as the work load changes. So, when repeated requests are made for a resource identifier, the value can be delivered from cache or computed on demand, from any available CPU core.

Grammar details

The NetKernel grammar supports nested, optional and interleaved groups, and many more features. Please refer to the online documentation for details. When you download and install NetKernel you can use the grammar debugger called "Grammar's Kitchen", one of the many developer tools available within NetKernel. NetKernel also includes the XUnit logical level unit test framework, which allows you to build complete tests of endpoints, grammars, etc.

Companion Videos

The following video tutorials (in two parts due to YouTube's 10 minute video limit) are intended as a companion to this article, and will guide you through the NetKernel download and installation process, importing of the demonstration module defined above, and how the NetKernel Grammar debugger and Visualization tools work.

Part 1

Part 2

Summary

This article has introduced the NetKernel 4.0 grammar technology and shown that it provides critical flexibility at the boundary between REST web service identifiers and compiled code. The grammar is bi-directional and can parse an identifier into named parts or build a properly formed identifier from supplied named part values. To learn more about NetKernel 4's grammar technology, download NetKernel 4 Standard Edition from the 1060 Research web site - http://www.1060research.com. The blog durable scope, by one of the NetKernel architects, provides insights into the design and implementation of the NetKernel platform.

References

[1] The NetKernel 4 Standard Edition open platform is available for download from http://download.netkernel.org
[2] The grammar technology is described by documentation in the NetKernel distribution and in online documentation.
[3] The Twitter REST API is documented on the Twitter Developer Website.
[4] A NetKernel module that includes the source code illustrating the use of the Grammar technology is available for download. To learn how to download NetKernel, install this module and make modifications, please view the companion video, part 1. and part 2.
[5] The companion videos, part 1 and part 2, show how to augment the UserTimelineAccessor class to do more than just reflect the provided information.
[6] The active URI scheme was proposed by HP.
[7] A discussion about NetKernel caching can be found at Tony Butterfield's blog.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

discussion by Tom Hicks

So it looks like the NK Grammar is a DSL, written in XML, which defines mappings between URIs and Endpoints. If so, the DSL could be made much more powerful (and concise) by writing it on top of a real programming language, instead of XML.

To me, the most interesting part is the ability to "run the grammar backwards" to construct a URI. Is the grammar used to "validate" the arguments during construction?

Note that DSLs for mapping URLs exist in other systems, too. In Grails, for instance: URL Mapping

Is this a problem that needs solving? by Jim B

This seems great in theory. However I disagree with this statement. "Web is malleable, but physical code is harder to change". I don't think physical code is very hard to change. You took something a simple as a URL string and turned it into a "context" and an XML file. So anytime the URL does change the XML file needs to be changed anyway.

Isn't this the kind of XML config files Java developers have been fighting against for years now. Why not just change the URL in the code. Then when a new developer has to change the code in two years, he does not need to find the an obscure XML file, its right there in the code where it is expected. Plus its straight forward, not a new API with contexts. Its just a string.

Plus, the concept of a "context" has always seemed like a code smell. It means your API is too abstract. What does the context do? Maybe it should be split into more concrete classes. It looks like in this case a context can getThisRequest(), createRequestToEndpoint(), issueRequestForResponse(), and createResponseFrom() and probably much more. I'm really not sure why these functions would be grouped together, other than its easier to create an interface with one argument called a context and have that do everything, instead of figuring out what arguments are really needed.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

2 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT