InfoQ

News

Debate: Why are most large-scale websites not written in Java?

Posted by Ryan Slobojan on Oct 29, 2007 11:00 PM

Community
Architecture,
Java
Topics
Performance & Scalability ,
Enterprise Architecture ,
Design
Tags
LAMP ,
Java EE

Nati Shalom of GigaSpaces recently asked why most large-scale websites were written in languages other than Java. This question touched off a large debate in the Java community, and InfoQ took the opportunity to learn more about the major viewpoints surrounding this issue.

In his post, Shalom noted that many of the sites that he knew of used a LAMP (Linux, Apache, MySQL, PHP/Perl) stack, and that several have developed custom filesystems like Google's GFS or utilized caches like memcached. Shalom noted similarities in the scalability solutions developed for both large-scale web applications and large-scale financial applications:

On the Data Tier we see the following:
  1. Adding a caching layer to take advantage of memory resources availability and reduce I/O overhead
  2. Moving from a database-centric approach to partitioning, aka shards
On the Business Logic Tier:
  1. Adding parallelization semantics to the application tier (e.g., MapReduce)
  2. Moving to scale-out application models to achieve linear scalability
  3. Moving away from the classic two-phase commit and XA for transaction processing  (See: Lessons from Pat Helland: Life Beyond Distributed Transactions)

Shalom then questioned how these similar solutions could have such different application stacks. One possible reason, which Shalom noted, was put forward by Todd Hoff - the LAMP stack is both powerful and free, and Java is used but as an ancillary component rather than as the core.

Some other opinions:

  • Justin Sher was quick to point out that eBay, GMail, Amazon, hi5.com and Google AdWords are built on top of Java
  • Shane Isbell pointed to cultural differences, questioning whether the stereotypical web developer is more interested in social networking sites and 'eye candy' than the stereotypical Java developer, and also commented that financial companies had greater budgets and tended to scale with hardware, whereas web companies tended to scale with software.
  • Another person suggested that the prevalence of Java solutions in financial applications had to do with partnerships between large Java EE vendors and financial institutions
  • Angelo Andreetto, who referred to several years of experience with financial companies, believes that a conservative approach to potential risk leads to the selection of Java-based solutions over heterogeneous software stacks
  • Someone else commented that the consequences of downtime for financial institutions were generally larger than for web companies
  • George Coller said that the question was mis-stated, and that the question should really be why isn't Java EE used more

Mickey Ohayon of GigaSpaces had a more detailed response:

In a technical perspective:
  • developing in Php / Perl is very fast and simple whereas JEE is more complex
  • historically speaking the knowledge, hosting services and developers are more available
  • LAMP proved to be stable and common whereas JEE was more of an infrastructure
  • JEE requires application servers that sometimes are overkill for a web system
  • The light web languages (Php/Perl) are more flexible to changes in the short run (as part of poor architecture that is based on Non-MVC, of course in the long run the cost of any change is dramatically higher)
  • The deployment and testing of java application is far slower and requires relatively strong machines
In financial perspective
  • JEE developers are far more expensive than Perl / Php
  • The learning curve and time to market are longer
  • Hosting of JEE application servers is more expensive

Jilles Van Gurp of Nokia commented that Java EE is optimized for the enterprise domain, which tends to have a different set of needs different than a large-scale consumer-oriented website:

These websites have relatively simple data base structures; relaxed requirements for things like transactions and persistence layers (mysql + non-transactional & ACID backend is good enough in most cases); virtually no requirements for heavy duty web service stacks; etc. Basically all the stuff J2EE is excellent for is just mostly overkill for implementing consumer oriented websites. You don't need the fancy IDEs; uber-flexible messaging buses; outrageously complicated transactional logic; etc.

Instead the focus is on extreme scalability; memory usage; cpu usage; caching; etc. Those things can be addressed with off the shelf components like squid, apache, distributed linux filesystems etc. They can also be addressed with Java components too but it requires that you have some J2EE experts around to integrate them. These are not exactly easy to recruit due to current scarcity on the job market and tendency of these people to end up in extremely well payed enterprise type jobs.

Van Gurp also believes that Java is well positioned for the future:

Finally, I think all this is changing. Running the Java implementation of ruby or php can give a nice security, performance, scalability and managability boost to your php or rails application. You'd be a fool not to try this if you are operating large scale deployments of these systems. This is still relatively unknown to php and ruby developers and quite many simply don't care about performance enough to do anything about it, instead preferring to invest in hardware. But once they make the shift to deploying on php or ruby on Java application servers, they'll discover that there is a world of additional components that can further enhance their applications. Arguably Google's web development tool chain (partially open sourced) is the state of the art in extremely large scale & rapid protyping web development. And writing the application logic is done 100% in Java from the web developer point of view. To the best of my knowledge, Google has no large scale deployment of php or similar architectures in their web UI layer (I'd be interested to learn if this is not true).

After watching the debate unfold, Shalom described his agreement with Michael O'Keefe's opinion, which encompassed several of the viewpoints described above. Shalom also mentioned that there appeared to be a convergence trend in the market, with tools such as Spring on Rails and Caucho's Java-based PHP implementation, and that the challenge of developing a scalable site would bring LAMP stacks and Java closer together in the future.

What do you think?

6 comments

Reply

Like any market, barriers to entry is key....... by Ben Hughes Posted Oct 30, 2007 8:32 AM
Re: Like any market, barriers to entry is key....... by Luis Garcia Posted Oct 30, 2007 9:56 AM
Re: Like any market, barriers to entry is key....... by Michael Neale Posted Oct 30, 2007 6:10 PM
Re: Like any market, barriers to entry is key....... by Tom Nichols Posted Nov 1, 2007 7:52 AM
Re: Like any market, barriers to entry is key....... by Johan Compagner Posted Nov 4, 2007 7:38 AM
The correlation between large-scale websites & large-scale applications is? by James Richardson Posted Oct 30, 2007 9:08 AM
  1. One of the problems with Java based large scale application development is its barrier to entry - from the cost of hardware to run a highly available distributed application, the cost of (good) java developers to build it, and crucially to the cost of learning. Its understandable that some organisations might err on the side of commodity hardware, cheaper developers and shorter learning curve - paticularly where equally meets the business need.

    From a learning perspective Java often lacks the 'convention over configuration' offered by the new (and convergent frameworks). You can install the entire stack on Ubuntu with a few keystrokes, using RubyWorks you have an out of the box Rails infrastructure that will scale to meet most requirements. Where's the Java alternative? While Java has grown to be infintely configurable this I'm sure puts people off. Being directed what to do (convention) is certainly be a lower barrier than being offered a thousand (configuration) options.

    With this lower learning overhead, we can shorten the path to understanding the the meat of the scalable discussion - the 'new' architecture patterns being lived out by your Facebook's, Ebay's & Twitters.

  2. There are plenty of huge applications that are not visible to the world that run on all sorts of architectures. Just because somebody isn't shouting about it on the "blog-o-sphere" (aka. "please, i only want to be famous") doesn't mean that plenty of it isn't going about.
    Even in the case where the actual web application isn't running on java/j2ee you can bet that huge amounts of the business applications behind may well be. (or of course on MQ, Cobol, SAP or some other untrendy thing)

    Of course the fact that Gigaspaces sponsors most of this site means they will get a bunch of exposure for their GoogleJuice.

  3. Where's the Java alternative?

    I think grails is heading in that direction.

    Java has many more framework choices than, say, ruby, python, or even PHP (which is getting better), which reflects its strength as a language/platform. However, one must do a lot of research into the best possible set of tools for a solution, and sometimes that prospect is just way too daunting. Especially if chosen for the wrong reasons.

    Hence the decision to use rails, zend, or whatever is a lot more palatable, and these people are a lot cheaper and tend to be able to whip up decent solutions quickly.

    Horses for courses really. I personally would lean towards a java-based platform simply for its robustness and all the nice services you get from the EE stack. And the other goodies like Spring, commons, JMX etc.

  4. There is also a desire just to "keep it simple" for web apps thats primary reason is to quickly build an interface into some sort of a database. Layers are not needed. PHP shines at this, and it really relies on the OS for any services it needs. Rails is kind of further up the "software engineering ladder" and provides a lot more structure. In both cases the frameworks don't try to do much, leaning on the OS and allowing you to call out to do it in the rare case that you go beyond shuffling data into and out of a database. I can appreciate that simplicity.

  5. Agreed. Grails is the first (Java) solution I'm aware of that brings it all together so you can just start writing a web application without tons of configuration crap. AppFuse is pretty close too, but in my experience I was still dealing with a lot of different frameworks that didn't always play nice together.

    Web designers are generally not software engineers first so they tend to pick up a simpler programming language. RoR is the first framework that seems to be written from the web designer perspective, looking to give a great application solution. Grails is coming from the Java software engineer perspective looking to give developers a great web app solution. They both seem to be finding that sweet spot right in the middle.

  6. Back to top

    Re: Like any market, barriers to entry is key.......

    Nov 4, 2007 7:38 AM by Johan Compagner

    if you want to have a java framework that doesn't have configuration crap look at wicket: wicket.apache.org

Exclusive Content

Book Except and Interview : Aptana RadRails, An IDE for Rails Development

Aptana RadRails: An IDE for Rails Development by Javier Ramírez discusses the latest Aptana RadRails IDE, a development environment for creating Ruby on Rails applications.

Fast Bytecodes for Funny Languages

Cliff Click discusses how to optimize generated bytecode for running on the JVM. Click analyzes and reports on several JVM languages and shows several places where they could increase performance.

Scott Ambler On Agile’s Present and Future

Scott Ambler, Practice Lead for Agile Development at IBM, speaks on the current status of the Agile community and practices having a look at the perspective of the Agile’s future.

Manager's Introduction to Test-Driven Development

Dave Nicolette and Karl Scotland try to introduce non-technical managers to one of the most popular Agile development techniques: Test-Driven Development (TDD).

Structured Event Streaming with Smooks

Smooks is best known for its transformation capabilities, but in this article Tom Fennelly describes how you can also use it for structured event streaming.

How to Work With Business Leaders to Manage Architectural Change

Successful architectures evolve over time to meet changing business requirements. Luke Hohmann presents how to collaborate with key members of your business to manage architectural changes.

Colors and the UI

In this article, Dr. Tobias Komischke explains how colors used in a GUI can influence our interaction with a computer and offers advice on using the appropriate colors for the interface.

Building your next service with the Atom Publishing Protocol

In his presentation, recorded at QCon San Francisco, MuleSource architect Dan Diephouse explores ways to use the Atom Publishing Protocol (AtomPub) when building services in a RESTful way.