InfoQ Homepage Articles Blaze Data Services or LiveCycle Data Services?

Blaze Data Services or LiveCycle Data Services?

This item in japanese

Feb 16, 2009 9 min read

InfoQ Article Contest

Share your knowledge Win a ticket to a QCon event
or an InfoQ Dev SummitFind out more

Summary

There have been a number of articles about the different versions of data services, yet none seem to clarify how to choose between the different versions. Also none have gone into much detail of the how the end points and channels affect application performance.

Although Adobe refers to 4 different version of Data Services, there are two primary versions. One is the open source Blaze Data Services and the other is the proprietary version called LiveCycle Data Services (LCDS). Both products offer the most important feature which is connectivity between Flex and Java via a Message Broker Servlet. They both can communicate with the server using remote procedure calls and messaging through the binary protocol, ActionScript Message Format (AMF). On top of these two core products Adobe offers supported versions and more extensive product suites.

Our experience at Gorilla Logic is the primary difference between these two products is support options and data management. The differences in performance and scalability are more debatable. LCDS offers additional endpoints and channels for client communication. Adobe states the primary advantage of these is improved scalability. But the fundamental means of communication is always AMF over HTTP, which has the same performance irrespective of the configuration of the server or client.

The other additional feature that LCDS offers is data management. This provides data synchronization between Flex clients and Java Server applications with real time conflict resolution. It also provides data assemblers and adopters for connecting data services to a persistent store via JDBC, Hibernate or other customer adopter.

So what are the 4 different versions?

Blaze Data Services - Free and Open Source edition
LiveCycle Data Services Community Edition - A supported version of Blaze DS
LiveCycle Data Service Single-CPU License - A free version of the commercial edition with the additional features but limited to a single CPU
LiveCycle Data Services - The paid version of the commercial edition with support

There also is a product suite which Adobe calls LiveCycle Data Services Enterprise Suite. This adds the additional products to the core data services to provide content services and document output using tools such as PDF Generation, Forms, and Digital Signatures.

Theoretically then the decision can be made based on three primary points.

Is support needed? This depends if the application needs support, for example for mission critical applications.
Do you need data management services? This depends on the requirements of the application for the data synchronization and management services.
Do you need the additional end points and channels of LCDS? According to Adobe if more than several hundred concurrent connections need to be made, and then the additional LCDS channels or end-points would be necessary. However this point is debatable. The number of concurrent connections a server can handler is based on number of factors, such as threads and I/O throughput. Also handling a larger number of concurrent connections can also be done via load balancing to multiple servers, like you would scale any Java Application Server.

A comparison chart has been made at the end of the article that gives an overview of the different versions.

Overview of the Different End-Points - Servlet and NIO

In Data Services an end-point is how the server listens for connections from the client. The standard endpoint for both Blaze DS and LCDS is a servlet based end-point that runs inside the application server. It uses the Message Broker Servlet, configured in web.xml, to participate in the standard servlet filter chain. This allows Blaze DS or LCDS to be deployed into an existing Java Application. Similar to any Java Servlet end-point, each client connection requires a separate thread on the server.

The NIO endpoint works entirely different. NIO stands for Java New Input / Output. The NIO endpoint creates a stand alone NIO-based socket server. The advantage of NIO is that a single thread can manage multiple I/O streams so it requires less threads and can handle a larger number of clients.

There are several challenges to using NIO. These are:

The client cannot access a NIO endpoint through a client side proxy.
The connection does not go through the standard servlet chain, which can break any part of the system that rely on servlet processing. For example in one project we used a servlet to handle file uploading to the server.
With NIO you typically have to use custom authentication. This is because the socket server is running in a separate process from the servlet container.

Although the NIO server can be configured to listen on port 80, it typically resides on a separate port. This can make network configuration challenging, because the network has to be configured to allow incoming connections on a new port. This could be problematic depending on the internal network of the server and the different client networks. A potential work around to this is to use a load balancer with sticky sessions.

However the advantages of Java NIO are debatable. Java NIO was developed to allow a single thread to handle multiple connections. It did this by having single thread iterate over a pool of connection and see if new data needs to be read or written. Since NIO was introduced in 2002 with Java 1.4, threading in the JVM and Linux has dramatically improved. The ability for Java and Linux to handle a large number of threads has dramatically increased.

For example the Linux Kernel introduced a new threading library in 2.6. This was the Native POSIX Linux Thread Library (NPTL). With NPTL tests have shown the ability to start 100,000 threads on a IA-32 in two seconds.¹ Without NPTL it takes the Linux Kernel 15 minutes to create the same number of threads.

Most interesting is a blog post by Paul Tyma and others that argues the Java NIO can actually be a disadvantage ², ³, ⁴. Through a series of benchmarks Paul demonstrates the following arguments against NIO:

Java NIO loses throughput by 20% to 30%
Thread context switching is NOT expensive
Synchronization is NOT expensive

Based on these tests Paul shows that one thread per connection can actually scale. Under JDK 1.6 the JVM can handle between 15K and 30K threads. This would mean the limit of servlet end point is not several hundred connections. Instead it is much higher, possibly beyond 15K connections. The actual limit of course depends on the hardware configuration, such as memory and CPU.

Overview of the Different Types of Channels

Over the basic network connection it is possible to use a number of different types of channels for communication between the client and server. For basic remote procedure calls a standard AMF channel is used.

The other type of communication is messaging. This can be used to create applications that push messages from the server and perform near real time communication. Example applications are chat servers, auction clients and collaborative services.

The primary way that data services do messaging is through polling. Because standard communication over HTTP does not keep the communication channel open, a polling channel has the client request waits on the server side until data becomes available. The wait time is adjustable from a few milliseconds to several minutes. This simulates the data being pushed from the server.

There are two basic types of polling channels, short and long polling. The primary difference is in how long the server waits for client data to become available.

A more advanced channel type is streaming AMF. This opens an HTTP connection to the server and allows the server to stream endless messages over this channel. This does have the advantage of no polling overhead from the client and it also uses standard networking configuration. This is the closest option for near real time streaming. The challenge with streaming AMF is that it uses HTTP 1.1 persistent connections which are implemented differently by the different browsers.

The final channel type is the RTMP (real time messaging protocol) channel which is currently only available in LiveCycle DS. Adobe has recently announced they will be publishing the specifications for RTMP. The guess is that it will eventually work its way into other products.

RTMP was designed for streaming large multi-media and data over a duplex channel. One of the primary benefits of RTMP is the connection with the client stays open so it can be used to push data from the server. This allows RTMP to be used for Comet style communications and real time data push.

There are three flavors of RTMP. One works over TCP and uses port 1935. The downside to this version of RTMP is that the connection has to be initiated from the client browser. Also it uses a non-standard port, so it is often blocked by client firewalls.

The other two flavors of RTMP encapsulate the RTMP message within HTTP requests. This allows the protocol to traverse firewalls and use standard ports. The two flavors are RTMPT which goes over standard HTTP and RTMPS which goes over secure HTTPS.

In Flex all calls to the server are performed asynchronously, so none of these channels affect the performance of the client. However they can have an impact on server performance, especially if a large number of client connections are open at the same time. For example streaming AMF could cause a large number of concurrent client connections to be open on the server and thus a large number of threads. As discussed earlier though, the impact of a large number of threads can be minimal.

All client connections can be configured with a default channel and alternative channels if that configuration fails. Depending on the type of communication the server is doing, a different chain of channels can be specified. For example an RTMP channel could be specified, but if that connection fails it could fail back to a long polling channel.

Conclusion

It seems the real benefit of LiveCycle DS over Blaze DS then is primarily the availability of support and data management. The benefits of additional end points and channels are debatable. From the projects we have done at Gorilla Logic we have not seen a need for NIO endpoints or RTMP. As with all technology however, nothing is definitive. I would be interested in others experiences in the comments.

Feature Comparison Chart

table

About the Author

Ryan Knight is a Senior Software Architect at Gorilla Logic where he does Flex and Java consulting. He also is the primary contributor to Anvil Flex, an open source project to help enterprises jump-start their flex development. He has worked with Java for over 12 years in a variety of roles, from development to consulting.

Resources

¹ http://www.linuxjournal.com/article/6530

²http://paultyma.blogspot.com/2008/03/writing-java-multithreaded-servers.html

³http://www.theserverside.com/discussions/thread.tss?thread_id=26700

⁴http://cometdaily.com/2008/11/21/are-raining-comets-and-threads/

Links from Adobe

LiveCycle Homepage

LiveCycle Data Services ES FAQ

Comparison of the different LiveCycle Data Services solutions

Links from other sources

LiveCycle ES vs LiveCycle DS vs BlazeDS - clearing up the confusion

Why are you NOT using LiveCycle DS?