Evolution in Data Integration From EII to Big Data
Approaches to integrating data are changing with emergence of cloud computing.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.

Posted by Michael Yuan on Dec 08, 2011
PaaS (Platform-as-a-Service) is a type of cloud service in which the provider delivers not only on-demand hardware and operating-system services, but also application platforms and solution stacks. For developers, PaaS dramatically reduces the headache and overhead of IT deployments, and makes applications more easily scalable by providing resources to the application on-demand.
In today’s hyper-competitive world, later may be too late to adopt Agile development and this Roadmap for Success will help you get started. Download "Agile Development: A Manager's Roadmap for Success" now!
The Java platform is well suited for PaaS since the JVM, the application server, and deployment archives (e.g., WARs and EARs) provide natural isolations for Java applications, allowing multiple developers to deploy applications in the same infrastructure. However, for the past several years, most PaaS offerings were around platforms such as Ruby and Python, whilst Google App Engine was a lone PaaS provider for Java developers. Fortunately, that is starting to change.
In the past year or so, several commercial providers have entered the Java PaaS space. It makes sense since the estimated 10 million Java developers almost certainly represent one of the biggest developer groups in the world. In this article, we will try to compare those PaaS offerings from the developers’ point of view. Specifically, our comparison methodology is to compare the features of each offering in 4 areas:
In this article, we will compare the following Java PaaS offerings (in alphabetical order).
One of the most important attributes of a Java PaaS provider is the technology platform and stack it supports. After all, the technology platform is what distinguishes Java PaaS from all other PaaS offerings. Yet, during the long evolution of the Java platform, there have been many competing technology stacks on the platform. For the Java PaaS vendor, I believe that supporting as many different technology stacks as possible is very important.
In this category OpenShift and CloudBees support the widest variety of technologies, from a simple servlet container (typically Tomcat) to full Java EE 6 Web Profile support (JBoss AS 7). The Java PaaS pioneer, Google App Engine, is now lagging behind most newcomers in terms of standards support. Google App Engine does not support the full Java SE platform, and hence offers poor support for many popular frameworks. Google App Engine also requires the user to program to its own network and persistence APIs, as opposed to supporting the open standard, resulting in applications that are very hard to port. Similarly, Heroku for Java requires the application to wrap around its own Jetty instance, breaking the more traditional Java EE application deployment model.
The Cloud Foundry project supports the Tomcat container. But its application development and deployment are heavily optimized for the Spring framework, creating an semi external dependency. Cloud Foundry is well suited to applications based on the Spring framework since its parent company, VMware, is also the owner of Spring. In addition the platform supports message queuing using RabbitMQ and based on the AMQP standard. But its support for other Java frameworks such as the Java EE is weak.
|
Amazon Beanstalk |
CloudBees |
Cloud Foundry |
Google App Engine |
Heroku for Java |
OpenShift |
|
|
Tomcat |
Yes |
Yes |
Yes |
No |
No |
Yes |
|
Java SE |
Yes |
Yes |
Yes |
No |
Yes |
Yes |
|
Java EE |
No |
Yes |
No |
No |
No |
Yes |
|
Support standard Java libraries |
Yes |
Yes |
Yes |
No |
Yes |
Yes |
|
File system access |
Yes |
Yes |
Yes |
No |
Yes |
Yes |
|
Thread access |
Yes |
Yes |
Yes |
No |
Yes |
Yes |
|
Outbound network connections |
Yes |
Yes |
Yes |
Limited |
Yes |
Yes |
|
MySQL |
RDS |
Yes |
Yes |
Paid plan |
Yes |
Yes |
|
Commercial relational databases |
RDS |
External |
External |
No |
External |
External |
|
Big Data support |
SimpleDB |
External |
External |
BigTable |
External |
External |
|
Deploy without special frameworks |
Yes |
Yes |
No |
No |
Yes |
Yes |
|
Friendly to migrate existing apps |
Yes |
Yes |
No |
No |
No |
Yes |
|
Portability of apps |
High |
High |
Moderate |
Low |
Low |
High |
|
Production ready? |
Yes |
Yes |
Beta |
Yes |
Beta |
Beta |
A key value of the PaaS is that it makes life easier for application developers, as it removes the overhead for application and resource management. So, developer friendliness and tools integration is an important consideration in our evaluation.
In this category CloudBees is a clear winner. It is not only a PaaS runtime environment, but also an integrated build and test environment. Developers can make use of the Jenkins service to have CloudBees automatically and continuously check out, build, test, and report code in the repository. This continuous integration process has been adopted by many large teams as a key component of their software development process. However, build server management is often time consuming and painstaking work for the QA team. CloudBees takes out this pain, and make the process much more transparent for developers. Recently, Red Hat OpenShift has made progress catching up to CloudBees in this space by supporting Maven and Jenkins integration.
Amazon Beanstalk, OpenShift, and Google App Engine all provide developer tools, SDKs, and IDE plugins that are consistent with other Java-based tools in the market.
Cloud Foundry and Heroku for Java, however, provide tools that are more suited for Ruby developers than for Java developers. Having used their tools, I suspect that many Java developers will take some time to get used to their conventions and terminologies. In addition Cloud Foundry currently suffers from poor documentation. For instance, much of its documentation is in the form of video tutorials. While video tutorials are great to get developer started, they lack the depth required for deploying serious applications, or for developers who wish to go beyond the scripted scenarios. Their official documentation of getting started guides were dated in 2007, despite significant changes their platform has gone through in the last couple of years. More recent documentation is available - for example here - but isn't as eay to find as it should be.
Another important point is that, while Cloud Foundry allows developers to setup their own cloud environments, to deploy Micro Cloud is significantly more involved than to just install an SDK. That is a barrier that makes Cloud Foundry difficult for many developers.
|
Amazon Beanstalk |
CloudBees |
Cloud Foundry |
Google App Engine |
Heroku for Java |
OpenShift |
|
|
IDE tools |
Yes |
Yes |
Yes |
Yes |
No |
Yes |
|
Command line tools |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
|
Web-based console |
Yes |
Yes |
No |
Yes |
No |
Yes |
|
Testing on dev machine |
Easy |
Easy |
Hard |
Hard |
Yes |
Easy |
|
Build without non-standard dependency |
Yes |
Yes |
No |
No |
No |
Yes |
|
Source control integration |
No |
Yes |
Yes |
No |
No |
Partly |
|
Integrated build |
No |
Yes |
No |
No |
No |
Yes |
|
Integrated testing |
No |
Yes |
No |
No |
No |
No |
|
Access to logs via web |
No |
Yes |
Yes |
Yes |
Yes |
Yes |
|
Third party developer / testing services |
No |
Yes |
No |
No |
No |
No |
|
API access |
Yes |
Yes |
No |
No |
Yes |
No |
|
Documentation |
Good |
Good |
Poor |
Good |
Good |
Good |
One of the most important features of PaaS is the platform’s ability to auto-scale. That is to increase and decrease server capacity based on real-time demand of traffic. It requires the platform provider to load balance requests across a number of servers, monitor the load on each server, and to spin up new servers as needed.
All PaaS providers support auto-scaling to some extent. But auto-scaling is harder than it looks. For starters, the Java EE application must be configured to access a centralized external database as opposed to a database server co-hosted on the same server. The programming paradigm and tools for all PaaS providers need to force the developer to do that.
An even bigger problem is HTTP sessions. In Java application servers, the session state of HTTP sessions are managed in-memory by default. To build applications that can be load balanced across different servers, the developer must do one of the following:
Of all PaaS platforms reviewed, Google App Engine handles this problem best. The Google App Engine is architected to abstract away the notion of individual servers. It automatically creates data stores in separate servers, and saves HTTP session into the data store by default. The process is completely transparent to developers. However, the issue with Google App Engine is that raw performance is poor. It is not uncommon for a web request to take 1-3 seconds to complete a round trip to databases.
Heroku for Java also provides automatic session sharing across server instances because each of its server instances is wrapped around a custom Jetty instance. However, Heroku does not provide transparent auto-scaling. You will have to watch the dashboard and add resources to the app as needed.
For the rest of the standard Java offerings, all of them do a good job forcing the developer to create database tables on a separate, dedicated database server as part of their deployment process. For HTTP sessions, Cloud Foundry uses sticky sessions in its load balancer. As we discussed above, while it makes life easy for developers, it also has some serious scalability issues. The rest of the PaaS offerings leave session management to application developers, although it is not always clear from their documentation.
|
Amazon Beanstalk |
CloudBees |
Cloud Foundry |
Google App Engine |
Heroku for Java |
OpenShift |
|
|
Built-in load balancer |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
|
Custom domain for load balancer |
Yes |
Yes |
No |
Google Apps |
Yes |
Yes |
|
Auto-scaling of app server |
Yes |
Yes |
Planned |
Yes |
No |
Yes |
|
Auto-scaling of database |
No |
No |
No |
Yes |
No |
No |
|
User defined performance criteria |
Yes |
Yes |
Planned |
No |
No |
Yes |
|
Web-based monitoring dashboard |
Yes |
Yes |
Planned |
Yes |
Yes |
Yes |
|
Clustered HTTP session |
Manual |
Manual |
Manual |
Auto |
Auto |
Manual |
The pricing of those PaaS offerings is an important consideration for developers. Most service providers offer free service tiers for developers to try out. For smaller Java web sites, those free tiers are excellent choices.
However, as Google App Engine’s recent price hike controversy indicated, cost for high volume web applications can be quite high with PaaS providers.
Another important factor to consider is the availability of support options. Google App Engine and Amazon Web Services both have poor track records providing support. Developers are left on their own to find out answers on forums. Smaller providers with Java specialty tend to provide better technical support, even on public forums. In my view CloudBees provides the best combination of paid ticket-based support, and Java-specific technical know-how amongst support staff.
|
Amazon Beanstalk |
CloudBees |
Cloud Foundry |
Google App Engine |
Heroku for Java |
OpenShift |
|
|
Free tier |
Yes |
Yes |
N/A |
Yes |
Yes |
Free |
|
Cost for low traffic entry level web apps |
High |
Free |
Free |
Free |
Free |
Free |
|
Cross cloud provider |
No |
No |
Planned |
No |
No |
Planned |
|
Private cloud |
No |
Beta (OpenStack or vSphere) |
Yes |
No |
No |
Planned |
|
Support |
Forum |
Email and Phone |
Forum/Web Support Tickets
|
Forum |
Email and Phone |
Forum |
|
Support quality |
Poor |
Good |
Good |
Poor |
Okay |
Good |
In this article, we reviewed 6 well known vendors in the Java PaaS space. There are of course more smaller or lesser known providers. Examples include
We will keep a close eye on these vendors as they could easily grow up to challenge both the market share and mind share of bigger players.
PaaS for Java has come a long way in the past 12 months. The product offerings are still fast evolving. That is great news for Java developers looking for low cost, scalable, and hassle free hosted solutions. For Java EE developers, I believe that CloudBees and Open Shift offer the “best of the breed” services so far, and with OpenShift still in beta, CloudBees is the winner of this comparison in this highly competitive landscape. If you are willing to venture outside of the Java EE comfort zone, Heroku for Java and Cloud Foundry (beta) are worthy contenders to the venerable Google App Engine.
Dr. Michael Yuan is an entrepreneur, author, and Java enthusiast. He has published 5 books and over 40 articles on software engineering, and committed code to noted open source projects such as JBoss and Mozilla. His latest startup, Ringful Health, aims to empower patients to better engage hospital teams and improve health care outcomes using mobile and predictive analytics technologies. Ringful Health’s Java servers are deployed on Google App Engine for Java, Amazon EC2, and CloudBees.
It's worth pointing out that a couple of weeks ago OpenShift added Jenkins integration, so that's another ticked box :-)
www.redhat.com/openshift/blogs/build-test-and-d...
You can also use an embedded Tomcat on Heroku: devcenter.heroku.com/articles/create-a-java-web...
I guess you could use any embeddable server.
Hi Guys!
Just look a new PaaS: jelastic.com
Thanks!
Cumulogic is not tied to OpenStack. It also works for example with CloudStack as per this announcement...
www.talkincloud.com/cumulogic-paas-gets-cloudst...
Based on their documentation, it appears they work with
"CumuLogic PaaS supports multiple Infrastructure-as-a-Service vendor clouds and virtualized environments: Citrix-Cloud.com CloudStack, Eucalyptus, OpenStack and VMware vSphere in private clouds, and Amazon EC2 public cloud."
All of those mentioned support EC2 REST APIs (except VMWare vSphere). OpenStack supports EC2 REST APIs and you can spin up instances with Euca2ools.
You mentioned the different ways to share session state. One of the most common ways it to combine the first and second option so you get the speed of 1 with reliability of 2 and three.
Sticky session having serious scalability problems might be an overstatement. It really depends on the power of your load balancer.
I doubt most people who read this article who can even make a BIG IP content switch even blink.
www.f5.com/products/big-ip/
You list Amazon BeanStalk as having no IDE support.
Funny.
They have always had IDE support from day one.
aws.amazon.com/eclipse/
GAE Java has always had Eclipse support.
code.google.com/appengine/docs/java/tools/eclip...
I've used it. It works well. :) I even wrote an article about it a few years ago.
I found it odd since Amazon started the whole free tier biz that they would not have one....
New AWS customers who are eligible for the AWS free usage tier can deploy an application in Elastic Beanstalk for free, as the default settings for Elastic Beanstalk allow a low traffic application to run within the free tier without incurring charges. If these applications require more resources than the default environment provides, customers will be charged the normal AWS rates for the incremental resources the application consumes.
How do you define free tier?
Forgot the link....
aws.amazon.com/elasticbeanstalk/
This is really very good and very helpful.
Thanks Mike
Richard,
Sorry I missed the IDE support for Beanstalk and GAE. Should have looked harder (as you can probably tell, I am not a heavy Eclipse users -- I have been using GAE since Day One with 10+ apps deployed on it). We are updating the tables to reflect these.
But the "free tier" of Beanstalk is not really comparable to free tiers offered by others: It is for new customers only, has a limited length, and does not include a database. It is really just a "free trial".
cheers
Michael
I think the real issue with sticky session is the difficulty to scale down the cluster. When you remove nodes from the cluster, all sessions associated with those nodes will disappear. No?
Of curse, one could combine sticky session with other approaches as you suggested. But that negates a key benefit of sticky session -- no code change inside the application.
cheers
Michael
It would be great to provide the definition for what "capabilities"/"features"/"topic" mean according to the author.
For e.g. "Cross Cloud provider" could be interpreted in many ways. This is good analysis but without good explanations on what these categories mean (especially for the non-obvious ones), it is a open ended discussion that will only add more confusion and debate.
A comparative review and scorecard of Oracle, IBM, Amazon, CloudBees, RedHat, CloudFoundry, Heroku, Google, Apprenda, Microsoft, and WSO2 PaaS across 7 categories and 80+ evaluation criterion can be found within the Selecting a Cloud Platform white paper wso2.com/casestudies/selecting-a-cloud-platform/
ActiveState's Stackato Private Platform-as-a-Service also includes strong Java support. ActiveState extended and hardened Cloud Foundry for Private Enterprise Clouds & tailoring it to work with both Java and dynamic programming languages such as Ruby, PHP, Perl and Python. Stackato uses a secure containerization approach and includes support for migrating your existing applications. For more information see: community.activestate.com/stackato
Approaches to integrating data are changing with emergence of cloud computing.
Michele Ide-Smith presents the lessons learned in the process of introducing UX principles and techniques into a large organization through a series of small steps.
Dave Farley and Martin Thompson discuss solutions for doing low-latency high throughput transactions based on the Disruptor concurrency pattern.
Rajneesh Namta shares his thoughts, experiences, and some of the critical lessons learned while implementing software test automation on a recent Agile project.
Dale Schumacher presents several patterns of actor interaction that can be used in collaborative programs written in any language.
Rúnar Bjarnason discusses Scalaz, a Scala library of pure data structures, type classes, highly generalized functions, and concurrency abstractions to perform functional programming in Scala.
One of the main challenges when designing software architecture is considering quality attributes. Not only their design turns out to be difficult, but also the specification of these attributes.
Michael Feathers analyzes real code bases concluding that code is not nearly as beautiful as designers aspire to, discussing the everyday decisions that alter the code bit by bit.
15 comments
Watch Thread Reply