BT

Introduction to NetKernel

Posted by Randolph Kahle on Jan 22, 2008 |

NetKernel is a software system that combines the fundamental properties of REST and Unix into a powerful abstraction called resource oriented computing (ROC). The core of resource oriented computing is the separation of logical requests for information (resources) from the physical mechanism (code) which delivers it. Applications built using ROC have proven to be small, simple, flexible and require less code compared to other approaches.

Hello World!

In a resource oriented system requests are made for resources and concrete, immutable resource representations are returned. This follows directly from the principles of REST and the World Wide Web. In the web a request for a resource such as http://www.1060research.com is resolved to an endpoint IP address by the Domain Name Service (DNS). The request is sent to that IP address and a web server delivers a response containing a concrete, immutable representation of the resource. Similarly, in NetKernel a resource request is resolved to an endpoint called an Accessor which is responsible for returning a concrete, immutable representation. NetKernel is implemented in Java so we will use this language for our first examples, however, later we will show that Java is one of many supported languages.

A request is delivered to the resolved Accessor's processRequest method. The following example creates an immutable StringAspect that contains the "Hello World" message and returns it in the response:

  public void processRequest(INKFConvenienceHelper context) throws Exception
{ IURAspect aspect = new StringAspect("Hello World");
INKFResponse response = context.createResponseFrom(aspect);
context.setResponse(response);
}

The context object is an interface to the microkernel which allows an endpoint to bridge between logical requests and physical code. The processRequest method is triggered when a logical request's URI is resolved to this endpoint. Unlike a Java Servlet where a URI request can only come from the Web, URI requests that resolve to NetKernel accessors can come from anywhere including other endpoints.

If a Servlet needs additional information it will use references to other Java objects, creating deep call stacks, for example when using JDBC access to a database. This is where ROC begins and the Web ends. Accessors do not have memory references to other accessors. To obtain additional information or call other software services, they issue sub-requests up into the logical resource address space.

In NetKernel resources are identified by a URI address, again just like in the World Wide Web.

Now we leave the physical level of Java objects and see how a URI address is defined and can be dynamically bound to our code. NetKernel is a modular architecture. Software resources can be packaged in physical containers called modules. As well as being a physical package for deployment, a module defines the logical address space for resources and their relationship to physical code. A module is like a completely self contained micro-world-wide-web for software. The following entry in a module definition maps the logical address "ffcpl:/helloworld" to our accessor, the HelloWorld object.

<ura>
<match>ffcpl:/helloworld</match>
<class>org.ten60.netkernel.tutorial.HelloWorld</class>
</ura>

The binding between a request and an endpoint occurs only at the moment the request is resolved. Once the accessor completes, the relationship is decoupled. This is different from Java and other languages where bindings, once made, are permanent. This dynamic logical binding results in very flexible systems which can be reconfigured at runtime.

In the logical address space we have new degrees of freedom. We can map requests from logical to logical, logical to physical (as above) or from one address space to another. The following rewrite rule illustrates that we can use a regular expression match to change a request's URI to another: The effect of this rule is to map a request such as "ffcpl:/tutorial/helloworld" to "ffcpl:/helloworld".

<rewrite>
<from>ffcpl:/tutorial/(.*)</from>
<to>ffcpl:/$1</to>
</rewrite>

A module is an encapsulated private logical address space. A module may export a public address space which can then be imported by other modules. In our example, our module exports the public address space "ffcpl:/tutorial/.*", promising to provide all resources located below the tutorial path.

<export>
<uri>ffcpl:/tutorial/(.*)</uri>
</export>

So far we have not considered where a request comes from. Without an initial starting point nothing would happen in the system. A transport is an endpoint that detects (external) events. When a transport detects an event it creates and issues a root request into the logical address space and waits for an endpoint to be located, bound, scheduled for processing and return a representation. A module may host any number of transports and each transport will inject its root requests into the hosting module. For example, the HTTP transport detects the arrival of an HTTP request and then creates and issues a corresponding internal NetKernel root request. The following diagram illustrates the path by which an HTTP request turns into a NetKernel request and travels to our HelloWorld accessor class.

     http://localhost:8080/tutorial/helloworld    (at the transport)
|
v
ffcpl:/tutorial/helloworld (request issued by transport)
|
v
ffcpl:/tutorial/helloworld (request that enters our module)
|
v
ffcpl:/helloworld (after rewrite rule)
|
v
org.ten60.netkernel.tutorial.HelloWorld (Physical Accessor code)

Nothing in the system dictates how requests for our Hello World resource originate. Its address space can be simultaneously connected to HTTP, JMS, SMTP, LDAP, ... you name it - external application protocols, or even other layers of our software architecture. Each module in a NetKernel system is a self-contained encapsulated resource oriented sub-system.

Addresses and Address Spaces

In NetKernel a module defines a private address space. Only three things can occur in the private address space:

  • Map a logical address to logical addresses
  • Map a logical address to physical endpoint
  • Import a logical address space from another module

With these three primary relations it becomes possible to craft cleanly layered and channeled architectures. Since these relationships are dynamic the architecture itself is also dynamically composable. The diagram below shows a high level view of a typical NetKernel application. A transport issues root requests into the private address space of a hosting module and NetKernel searches for an endpoint that will process the request. In our example the hosting module imports modules "A", "B" and "C". Each adds their public exported address space to the host module's private address space, specifically ffcpl:/accounting/.*, ffcpl:/payroll/.* and ffcpl:/crm/.*.

If in our example a root request is issued for the URI ffcpl:/crm/contacts it will match the exported address space of module "C" and the request will be sent to that module, eventually being resolved to physical code which can fulfill the request for ffcpl:/crm/contacts, perhaps by using physical objects such as a JDBC connection or more commonly by issuing a sub-request for a logical relational database service available within the scope of module "C".

Accessors

Next we switch back to the physical level and take a closer look at accessors. As we have seen, accessors are endpoints that return resource representations. Accessors themselves are very simple. They can only do four things:

  1. Interpret what resource is being requested (i.e. by inspecting the initiating request)
  2. Create and issue sub-requests for additional information (as synchronous or asynchronous requests)
  3. Create value by performing their service - doing something!
  4. Create and return an immutable physical resource representation

An accessor discovers from its context what it is being asked to do. The request URI can be retrieved as well as the argument values of named parameters. An accessor may need to know which URI was used by the current request if multiple addresses are mapped to a single endpoint. With logical / physical separation one piece of code may have multiple logical locations mapped to it.

Services are called with named parameters using the active: URI scheme.The active scheme takes the form

active:{service-name}+{parameter-name}@{uri-address}

Each active URI specifies one service and any number of named parameters. For example, the toUpper service takes a single parameter named operand and returns the upper case transformation of the resource identified by the URI supplied as the argument.

active:toUpper+operand@ffcpl:/resources/message.txt

The following BeanShell script implements the toUpper service. It retrieves the immutable aspect for the "operand" resource using the sourceAspect method with the URI this:param:operand. We could use the context object to obtain the calling request, look for the named parameter "operand", obtain its URI and issue a sub-request for that resource. Instead the NetKernel Foundation API provides a local internal logical address space for the arguments of a request. By requesting the URI this:param:operand we are effectively asking to dereference the operand pointer.

import org.ten60.netkernel.layer1.representation.*;
import com.ten60.netkernel.urii.aspect.*;

void main()
{
sa=context.sourceAspect("this:param:operand",IAspectString.class);
s=sa.getString();
sa=new StringAspect(s.toUpperCase());
resp=context.createResponseFrom(sa);
resp.setMimeType("text/plain");
context.setResponse(resp);
}

The script specifies that it wants the operand resource returned as an implementation of the IAspectString interface. However, at the logical level, code is not aware of physical level types. This leads to a new concept called transrepresentation. If a client requests a representation type that an endpoint does not provide then the microkernel can intermediate. When a mismatch is detected, the microkernel searches for a Transreptor that can convert from one type to the other.

Transreptors turn out to be very useful. Conceptually, a transreptor converts information from one physical form to another. This covers a significant amount of computer processing including:

  • Object type transformation
  • Object structure transformation
  • Parsing
  • Compiling
  • Serializing

The key point is that this is a lossless transformation, information is preserved while the physical representation is changed. Transreptors help reduce complexity by hiding physical level details from the logical level allowing developers to focus on what's important - information. For example, a service such as active:xslt requests information as a DOM and the developer working at the logical level provides a resource reference whose representation at the physical level is a text file containing XML. NetKernel will automatically search for a transreptor that can transrept (parse) the textual XML representation into the DOM representation. The architectural and design significance of transreption is type decoupling and increased application and system flexibility.

In addition, transreption allows the system to move information from inefficient forms into efficiently processable forms, for example, source code to byte code. These transitions occur frequently but only require a one-time conversion cost and thereafter can be obtained in the efficient form. In a formal sense, transreption removes entropy from resources.

Resource Models

We have seen how a logical level URI address is resolved to a physical level endpoint and bound to it for the time it takes to process. We have seen that physical level concerns such as type can be isolated in the physical level. We also have seen that services can be called with named parameters, all encoded as URI addresses.

This leads to the idea of a resource model, a collection of physical resource representation types (object models) and associated services (accessors) that together provide a toolset around a particular form of information, for example, binary streams, XML documents, RDF graphs, SQL statements and result sets, images, JMS messages, SOAP messages, etc. The idea of a resource model allows a developer to build composite applications out of one or several resource models in combination, echoing the Unix philosophy of having special reusable tools rapidly combined together to create solutions.

The Image resource model includes services such as imageCrop, imageRotate, imageDither and more. Using the image resource model a developer can create image processing pipelines, all with simple requests such as:

active:imageCrop+operator@ffcpl:/crop.xml+operand@http://1060research.com/images/logo.png

NetKernel's XML resource model includes transformation languages, several validation languages and many other XML technologies. Above this, a specialization of the XML resource model is the PiNKY feed processing toolkit that supports ATOM, RSS, and many simple feed operations and is 100% downward compatible with the XML resource model. With transreptors, a developer need not know if an XML resource is physically a DOM, SAX stream or one of the several possible representation types. Using the XML resource model developers can quickly build XML processing systems. For example the following request uses the XSLT service to transform the resource ffcpl:/data.xml with the style sheet resource ffcpl:/style.xsl:

active:xslt+operator@ffcpl:/style.xsl+operand@ffcpl:/data.xml

Sequencing

Resource request URIs are essentially "op-codes" for the resource oriented computing model. Just like Java byte-codes, they are generally too low level and would be difficult to code manually. Instead one can use a number of scripting languages to define and and issue these requests. The context object we saw earlier is an example of a uniform POSIX like abstraction around the microkernel called the NetKernel Foundation API. This API is available to any supported dynamic procedural languages. In addition, specialist declarative languages are provided whose purpose is solely to define and issue source requests.

One such scripting language is DPML, a simple language that uses an XML syntax. Why XML syntax? Because in a dynamic loosely coupled system where code is a resource like any other it is very straight forward to create processes that dynamically generate code. And XML syntax is an easy output format for code generation. To give a flavor of DPML, the following instruction requests the same XSLT transform as in the preceding section, each "instr" corresponds with an active: URI request and each "target" is an assignment to another resource. The URI this:response is used as a convention to indicate the resource to be returned by the script.

<instr>
<type>xslt</type>
<operator>ffcpl:/style.xsl</operator>
<operand>ffcpl:/data.xml</operand>
<target>this:response</target>
</instr>

From this foundation it is easy to interpret the following DPML program that creates an HTML page from a database in two requests:

<idoc>
<instr>
<type>sqlQuery</type>
<operand><sql>SELECT * from customers;</sql></operand>
<target>var:result</target>
</instr>
<instr>
<type>xslt</type>
<operand>var:result</operand>
<operator>ffcpl:/stylepage.xsl</operator>
<target>this:response</target>
</instr>
</idoc>

In NetKernel, language runtimes are services. Like any other service, they are stateless, and perform the execution of a program when the program code is transferred as the state. This is very different from the traditional view of software at the physical level where languages sit in front of information instead of playing a facilitating role for information. For example, to use the Groovy language runtime service, the following request provides the resource ffcpl:/myprogram.gy containing the program as the state for the request.

active:groovy+operator@ffcpl:/myprogram.gy

NetKernel supports a wide range of languages including BeanShell, Groovy, Ruby, JavaScript, Python, DPML XML languages such as XQuery and of course, dynamically compiled Java. Any language that runs on the Java virtual machine can be integrated into NetKernel including custom languages such as work flow engines.

Patterns

ROC presents a new set of architectural design patterns in the logical level. Let's look at two examples, Mapper and GateKeeper.

The Mapper pattern is a way to direct a bounded infinite set of resource requests to a single physical point of code. In this pattern, a request for a resource in one space is mapped to physical code which interprets and reissues each request into the mapped address space. The response of the second mapped request is returned by the mapper as the result of the first request.

This pattern has many variants, one service called active:mapper uses a resource containing a routing map between address spaces. Another example is the Gatekeeper which is used to provide access control for all requests entering an address space. The Gatekeeper will only admit requests when sufficient credentials are available to validate the request.

All variants of the mapper pattern may be transparently layered over any application address space. Other uses of this pattern include auditing, logging, semantic and structural validation, and any other appropriate constraint. A particular strength of this pattern is that it can be introduced in the application without interfering with its architectural design.

Because the relationship between software in NetKernel is logically linked and dynamically resolved, interception and transformation of requests is a completely natural model. The logical address space exhibits in a very uniform way all of the characteristics found in specialist physical level technologies such as AOP.

Application Development

Building applications using ROC is straight forward. If any new physical level capabilities are required, such as a new resource model, the necessary accessors, transreptors, etc. are constructed. Then at the logical level, applications are composed by identifying and aggregating resources. Finally, constraints are applied such as request throttles, security GateKeepers and data validation.

The three "C"s of ROC - construct, compose, constrain are applied in that order. This order can be reversed to make changes - constraints can be lifted revealing the composed application and allowing changes and subsequently the constraints can be reapplied. This differs from object-oriented programming where constraints are an initial consideration - classes inherently impose constraints on the use of their objects and hence the information they contain. Changes to information structure in a physical object-oriented system initiates a ripple of events - recompilation, distribution, system restarts, etc. all of which are not necessary in a NetKernel system. When compared to the flexibility of a logical system, physical level object-oriented systems appear brittle.

Application Architecture

Systems designed with physical level technologies usually rely on mutable objects residing at various levels of an architecture. For example, object-relational mapping technologies such as Hibernate exist to create a layer of objects whose state matches that of a persistent store managed by an RDBMS. In such a design, updates are applied to objects and it is the responsibility of the mapping layer to migrate those changes to a relational database.

With ROC all representations are immutable. This leads immediately to two architectural consequences - first, caching of immutable objects can dramatically improve performance (more on this later) and second, immutable objects cannot be updated - they must be invalidated and re-requested.

While there are many valid application architectures that can be implemented with ROC, a data channels approach is commonly seen. In this design as with many, the application is composed of logically layered address spaces and passing vertically through these layers are separate read and write channels for application information. These channels might have addresses such as ffcpl:/customers or ffcpl:/register-user.

In the diagram below an integration layer translates the form of information from different sources into common structures. The read information channels support resources such as ffcpl:/customers which return representations of desired information. In the write channels URI addressed services such as ffcpl:/register-user do two things, first they update persistent storage and they invalidate any cached resource representations that depend on the update information. To developers used to the OR mapping approach (with e.g. Hibernate) this may seem very strange. In fact, it is a simple, elegant and high performance solution.

Performance

By now you must be thinking that ROC systems will spend more time running the abstraction than doing real work. However, and counter-intuitively, the ROC abstraction yields significant performance advantages.

Caching

Since every resource is identified by a URI address and the result of requesting a resource is an immutable resource representation, any computed resource can be cached using the URI address as the cache key. In addition to computed resources, NetKernel's cache stores meta information about resource dependencies and the cost of computing each cached entry. Using the dependency information the cache guarantees that cached resources are valid as long as all resources it depends upon are also valid. If a resource becomes invalid then its cached representation and all dependent resource representations are atomically invalidated.

NetKernel uses the stored computational cost information to guide it to retain the dynamic optimal set of resources - resources in the system's current working set judged valuable by frequency of use and the cost to recompute if ejected from cache. The operational result of NetKernel's cache is the systemic elimination of many redundant computations. Empirical evidence from operational systems indicates that typically between 30% and 50% of resource requests are satisfied from the cache in regular business applications. In the limit of read-mostly applications this can rise to nearly 100% giving a dynamic system with pseudo static performance. Furthermore, as the character of the system load changes over time, the cache rebalances, retaining resources that are currently most valuable.

Scaling with CPU cores

As described in the introduction, the essence of ROC is the separation of the logical information process from its physical implementation. Each request for a logical resource must ultimately be assigned to a physical thread for execution. The microkernel implementing the ROC system can optimally exploit computational hardware as it repeatedly schedules an available thread to execute each logical request. Essentially the logical information system is load balanced across available CPU cores.

Asynchronicity

ROC is innately asynchronous. The NetKernel Foundation API presents an apparently synchronous model however the microkernel actually internally schedules all requests asynchronously. A developer can therefore think with the logical clarity of sequential synchronous code while transparently gaining the ability to scale an application across wide multi-core architectures.

Additionally, accessors may be explicitly marked as being thread-safe or not, signaling to the microkernel whether it can schedule concurrent requests. This allows adoption and integration of libraries and other third party contributions without fear of unpredictable results.

Summary

NetKernel is radically different. Not, in order to create another technology stack, but in order to take a simple set of core principles (those of the Web, Unix and set theory) and extrapolate them into a coherent information processing system. In fact, NetKernel's genesis was the question, "Can the economic properties of the Web be transferred to the fine-grained nature of software systems?"

NetKernel has been production hardened over nearly eight years, from its inception at Hewlett Packard Labs through corporate enterprise architectures and even as the next generation platform for key web infrastructure (Purl.org). NetKernel has proven to be entirely general purpose and can be applied as easily to data integration, message processing or web / email Internet applications as any other application.

References

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT