BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

AtomServer – The Power of Publishing for Data Distribution – Part Two

Posted by Chris Berry & Bryon Jacob on Sep 26, 2008 |

In Part One of this series we introduced AtomServer, an extensible open source framework for creating Atom Stores. Atom Stores are a new trend in data services architecture based on coupling the Atom Publishing Protocol with GData-style extensions.

AtomServer was extracted from a heavily loaded, highly available data distribution system at Homeaway.com. Because it was extracted from a working system interacting with a large and varied set of clients, AtomServer evolved with a particularly client centric focus. This motivated the creation of several extensions to the AtomPub specification. The most notable of which are Auto-Tagging, Batching, and Aggregate Feeds.

Auto-Tagging

Categories are one of the most useful concepts in AtomPub. An Atom Category is essentially a name-value pair (in Atom these are called Scheme and Term), associated with an Atom Entry as additional metadata. This is a powerful concept because it enables clients to categorize data and apply new relationships as needed without manipulating the original data source.

AtomServer maintains Category metadata on Entries automatically. It also provides a mechanism to perform full Boolean queries against these Categories (e.g. ANDs and ORs). (GData provides a similar mechanism, which was explained in detail in Part One of this series.)

With categories playing such a significant role in AtomServer, one of the first features we prioritized was the ability to "auto tag" an Entry. An auto tagger automatically computes Categories, based on the contents of submitted Entries (PUT).

Auto tagging is accomplished by implementing an EntryAutoTagger interface. EntryAutoTaggers are wired into an AtomServer Workspace using an IOC container such as Spring

The most common Entry content type is XML, therefore AtomServer provides an XPathAutoTagger implementation. This EntryAutoTagger is easily configured to generate Categories for an Entry based on executing XPath expressions against its content XML.

For example, from this content XML:

<widget id="123">
<color>red</color>
<size>small</size>
</widget>

XPath expressions like /widget/color and /widget/size might generate the Categories; (urn:color)red and (urn:size)small. The client does not have to explicitly create these Categories or even be aware that they exist. Rather, the XPathAutoTagger reads the XML content and creates the Categories automatically. Feed readers then pull categorized Feeds knowing that any required Categories are present. Furthermore, the system doesn’t depend on potentially unreliable clients to manage Categories explicitly.

Aggregate Feeds

Atom Stores often contain multiple distinct Workspaces and/or Collections composed of interrelated data. For example, a company might have an Atom Store containing one Workspace with an Entry for each employee, and another Workspace containing meetings between sets of employees. The payroll department may want to pull a Feed of employees for paycheck processing and the building administrator needs to monitor meeting bookings. Both of these examples are cases of the simplest AtomPub Feeds.

As another example, let’s say there are also Project Managers wanting a Feed of all meetings joined by each employee. Rather than force the managers to pull two separate Feeds while correlating the data sets, as they would with conventional AtomPub, AtomServer adds the concept of an Aggregate Feed. Aggregate Feeds allow for any Entries, even those from separate Workspaces and Collections, to be joined into a single data Feed.

AtomServer constructs Aggregate Feeds using the powerful concept of Categories. A particular Scheme (the first half of a Category’s Scheme/Term pair) is chosen to be the Join Scheme for an Aggregate Feed. Two or more Entries that should be joined as an aggregate must be tagged with the same Term in the same Join Scheme.

Aggregate Feeds are specified using a URI where the Workspace is specified as $join, and the Collection is specified as some Join Scheme (i.e. $join/{Join Scheme}). An Aggregate Feed URI contains one Entry for each unique Term existing for the Scheme specified in that URI. These Entries are aggregated into the content of that Entry as an aggregate element (an AtomServer specific extension element) containing the appropriate component Entries of the aggregate (i.e. all entries that were tagged with this aggregate’s Scheme/Term pair).

Coming back to our example, let’s imagine the company has the following data in the Atom Store (designated by the standard AtomServer URI structure: {workspace}/{collection}/{entryId}).

/employees/acme/cberry.xml
<employee name="Chris Berry" id="123" dept="dev"/>
/employees/acme/bjacob.xml
<employee name="Bryon Jacob" id="345" dept="dev"/>
/meetings/acme/standup.xml
<meeting name="standup" time="Every Tuesday 9:15" >
<employee id="123" />
<employee id="345" />
</meeting>

Before the Project Managers can pull an Aggregate Feed of "employees with their meetings," we had to create an appropriate Atom Category on each Entry we want joined together, where the unique Category scheme defines an "aggregate Category," and each Category term in that scheme defines an "aggregate Entry." This would most likely have been done using the AutoTagger mechanism described above.

If we had set up our Join Scheme as urn:EID, then we would define the following Categories for our Entries:

/employees/acme/cberry.xml <- (urn:EID)123
/employees/acme/bjacob.xml <- (urn:EID)345
/meetings/acme/standup.xml <- (urn:EID)123
(urn:EID)345

With these Categories applied to the Entries, the Project Manager can now pull an Aggregate Feed based on the scheme - urn:EID. Since the Workspace for all Aggregate Feeds is $join, the appropriate URL is then:

http://your.atomserver/$join/urn:EID

which returns an Aggregate Feed with one Entry for each unique term in the urn:EID scheme.

The contents of each Entry are an <aggregate> element that contains the set of Entry XMLs for each "real" Entry mapping to the Category. For our example, an Aggregate Feed response would resemble the following. Note that many of the XML elements have been pruned for brevity.

 <feed xmlns="http://www.w3.org/2005/Atom"
xmlns:as="http://atomserver.org/namespaces/1.0/">
<as:endIndex>16573</as:endIndex>
<id>tag:atomserver.org,2008:v1:urn:EID</id>
<entry>
<id>/atomserver/v1/$join/urn:EID/345.xml</id>
<as:entryId>345</as:entryId>
<content type="application/xml">
<aggregate
xmlns="http://schemas.atomserver.org/atomserver/v1/rev0">
<entry xmlns="http://www.w3.org/2005/Atom">
<id>/atomserver/v1/employees/acme/bjacob.xml</id>
<as:entryId>bjacob</as:entryId>
<as:workspace>employees</as:workspace>
<as:collection>acme</as:collection>
<content type="application/xml">
<employee
xmlns="http://schemas.atomserver.org/examples"
name="Bryon Jacob" id="345" dept="dev" />
</content>
</entry>
<entry xmlns="http://www.w3.org/2005/Atom">
<id>/atomserver/v1/meetings/acme/standup.xml</id>
<as:entryId>standup</as:entryId>
<as:workspace>meetings</as:workspace>
<as:collection>acme</as:collection>
<content type="application/xml">
<meeting
xmlns="http://schemas.atomserver.org/examples"
name="standup" time="Every Tuesday 9:15">
<employee id="123" />
<employee id="345" />
</meeting>
</content>
</entry>
</aggregate>
</content>
</entry>
<entry>
<id>/atomserver/v1/$join/urn:EID/123.xml</id>
<as:entryId>123</as:entryId>
<content type="application/xml">
<aggregate
xmlns="http://schemas.atomserver.org/atomserver/v1/rev0">
<entry xmlns="http://www.w3.org/2005/Atom"
<id>/atomserver/v1/employees/acme/cberry.xml</id>
<as:entryId>cberry</as:entryId>
<as:workspace>employees</as:workspace>
<as:collection>acme</as:collection>
<content type="application/xml">
<employee
xmlns="http://schemas.atomserver.org/examples"
name="Chris Berry" id="123" dept="dev" />
</content>
</entry>
<entry xmlns="http://www.w3.org/2005/Atom">
<id>/atomserver/v1/meetings/acme/standup.xml</id>
<as:entryId>standup</as:entryId>
<as:workspace>meetings</as:workspace>
<as:collection>acme</as:collection>
<content type="application/xml">
<meeting
xmlns="http://schemas.atomserver.org/examples"
name="standup" time="Every Tuesday 9:15">
<employee id="123" />
<employee id="345" />
</meeting>
</content>
</entry>
</aggregate>
</content>
</entry>
</feed>

There are several important things to note about Aggregate Feeds:

  • The sequence number for an Aggregate Feed (the number returned in <as:endIndex> and used for consistent Feed paging in AtomServer) is equal to the highest sequence number from any of its "child Entries." The important result is that if any of the components of an aggregate change, the aggregate is returned the next time the Aggregate Feed is pulled.
  • The set of Categories for an aggregate Entry is the union of the Categories on all its components.
  • Aggregate Feeds may also be subject to a Category search. So, the request /$join/urn:EID/-/(urn:department)dev would pull all the aggregate Entries from our Feed that have the (urn:department)dev Category defined.
  • A localized Aggregate Feed is specified in the same way it is for normal Feeds (i.e. /$join/urn:EID?locale=en_US). A localized aggregate Entry only exists if at least one of its components is localized in the given locale. If there are no localized Entries, then that aggregate is not returned in the feed.

Aggregate feeds define three new XML elements on Entries inside an <aggregate> element.

  • <as:workspace> - contains the workspace of the component Entry
  • <as:collection> - contains the collection of the component Entry
  • <as:locale> - contains the locale of the component Entry (if any)

These elements allow consumers of Aggregate feeds an easy way to programmatically determine the characteristics of each component of an aggregate.

Batching

Because there is implicit round-trip overhead when making separate server calls for each Entry, there are system performance benefits from batching groups of operations into a single request (POST, PUT, or DELETE). Unfortunately, AtomPub does not provide a batching mechanism, so we added batching capability to AtomServer (inspired by similar functionality in GData).

Possibly a more RESTful way to implement batching might have been to use the Multipart capabilities within HTTP. However, this technique was deemed too complex for the clients, especially considering the wide-range of languages our clients speak. Instead, we chose a URL-based scheme.

We also briefly considered using a custom HTTP verb, such as BATCH, to indicate a batch was being written. But again, in order to maximize client interoperability, we opted to stick to the "standard" set of HTTP methods, and use a different URI to indicate batch operations.

Batch operations in AtomServer are accomplished by doing a PUT to a "virtual" entry in a Collection using a special EntryId named $batch. For example, this URL

PUT http://your-atom-server/widgets/acme/$batch

indicates a batch operation on the acme Collection in the widgets Workspace. Note the structure of this URI implies a given batch operation only applies to the Entries within a particular Workspace and Collection.

The XML content of the request for a batch PUT is an Atom Feed, which contains an Atom Entry for each item in the batch. Entries are formed exactly as they are for individual requests, plus a few possible extensions.

AtomServer declares a new extension XML element, <asbatch:operation>, in the http://atomserver.org/namespaces/1.0/batch namespace that allows the client to specify whether each Entry in the Feed is an update (PUT), an insert (POST), or a delete (DELETE). These elements may be applied either globally to the entire batch Feed or individually to each Entry.

For example, the following batch request:

<feed xmlns="http://www.w3.org/2005/Atom"
xmlns:asbatch="http://atomserver.org/namespaces/1.0/batch">
<entry>
<asbatch:operation type="update" />
<link href="/widgets/acme/123.xml/*" rel="edit" />
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<widget id="123>
<color>red</color>
<size>small</size>
</widget>
</div>
</content>
</entry>
<entry>
<asbatch:operation type="delete" />
<link href="/widgets/acme/234.xml/*" rel="edit" />
</entry>
</feed>

attempts to update Entry 123 and delete Entry 234.

The HTTP response to a batch Feed is a 200 OK if the batch was interpreted and processed, even if there were errors with individual entries in the batch.

The content of the XML response is again an Atom Feed element with one Entry corresponding to each of the Entries in the original batch request. These Entries appear in the same order as in the corresponding batch request.

In each of these response Entries, AtomServer adds a custom element representing the HTTP status code that is returned if the Entry has been submitted as a standalone operation. For successful Entries, the response is:

<asbatch:status code="200" reason="OK"/>

or

<asbatch:status code="201" reason="CREATED"/>

If an error occurs, the error code and reason is given. For example:

<asbatch:status code="404" reason="NOT FOUND"/>

or

<asbatch:status code="409" reason="Optimistic Concurrency Error"/>

Additionally, there is an <asbatch:results> element at the Feed level that indicates how many inserts, updates, deletes, and errors occurred. So, for the example above we might receive:

<asbatch:results inserts="0" updates="1" deletes="1" errors="0"/>

Examining this "rollup" report enables a client to dig in for specific errors only when the number of reported errors is non-zero.

What’s Next

We built AtomServer on AtomPub because its RESTful design affords scalability and interoperability. Building on a standard allowed us to utilize the collective wisdom of the Atom community to tackle our problems. Looking to GData for inspiration, we also added several useful extensions, such as batching and optimistic concurrency. As we expanded our AtomServer usage, the need for additional features including Auto Tagging and Aggregate Feeds became apparent. Thanks to the extensible nature of Atom, these features were easy to add without threatening the interoperability with existing Atom clients. The power of Atom truly became apparent when we were able to add these powerful features without sacrificing the simple interaction we enjoy with most of our clients. It is the best of both worlds.

Since its open source debut in May 2008, AtomServer has generated considerable interest. We’re hoping that even more people will pick up AtomServer, use it, and tell us what it’s missing! You can download AtomServer from http://www.atomserver.org, and be up and running in minutes.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT