BT

High-Volume / Scalable Architectures with vert.x - interview with Eberhard Wolff

by Ralph Winzinger on May 20, 2013 |

Web 2.0 and the massive growth of mobile clients changed the way we have to think about application architectures. Node.js was one of the first technologies that tried to address this challenge by making use of a non-blocking, asynchronous runtime environment for server-side software based on JavaScript. Last year, vert.x was introduced - a similar runtime realized within the Java virtual machine. In contrast to Node.js, vert.x follows a true polyglot approach and allows developers to build their systems with JavaScript, Groovy, Java, and other languages. Even mixed within one application.

InfoQ got in touch with Eberhard Wolff to discuss the differences between those two technologies, the challenges that emerge from building architectures based on them and the benefits that those architectures provide.

Eberhard was formerly principal consultant for SpringSource and is now architecture and technology manager for adesso AG. He has been analyzing and designing enterprise systems for years and has spent the last few months researching the field of vert.x.

InfoQ: Node.js is now about four years old, vert.x about two and it becomes apparent that there is a certain need for those technologies and concepts. What is your point of view regarding Node.js or vert.x? Where is it useful to apply these technologies? Which use cases are they made for?

Eberhard Wolff: I think there is an increasing challenge to handle a large number of clients in a high performance environment. An asynchronous model helps there - it is just possible to handle a lot more clients with less threads. But there is more to vert.x. On the JVM, you usually only have one programming language in an application. Java is still the dominant and preferred programming language. Then, there's Scala with its own frameworks or for example Clojure. This is different from what's going on .NET. From the very beginning, .NET was positioned to be polyglot - even in the same application. vert.x allows different languages to cooperate on the JVM more easily. There are also wrappers which makes using the framework feel natural in every language. You can run JavaScript code on your JVM and by transferring JSON objects on the vert.x event bus you can integrate it with different languages like Java or Scala. It's an interesting evolution because it's the only approach for the JVM that I am aware of that is truly polyglot. And then there's a challenge on the JVM concerning modularity: How do you make individual modules separately deployable? One possibility would be to use OSGi. However, it is still not very broadly used. Another way would be to deploy various WAR files on one server and let them communicate via SOAP - but that is a rather complex solution with considerable performance overhead. vert.x offers an elegant approach to modularization. So vert.x is a very interesting framework because it is asynchronous, polyglot and modular.

 

InfoQ: Speaking of Node.js, I'm actually writing only one application which will be run in the Node.js server. It reacts in a single-threaded way to requests and has the ability to reach down those requests to a thread pool to do some follow-up processing if necessary. That means, that I do not have any modularity or different components as a higher level concept, that would be able to communicate with each other. How's vert.x working here in contrast?

Eberhard: In vert.x an event bus is used by the modules - the various deployment units - to send events to each other. Those events can for example contain JSON objects. The modules in vert.x have separate classloaders, so they can really be seen as separate components. So you can put functionality into different modules. For example, there's a module available on Github for interfacing with MongoDB. It consumes the appropriate events from the event bus and stores the data to the database. Compared to asynchronous programming it looks like a minor feature at first glance but I believe that this is really important. It addresses problems you actually face quite often within enterprise applications.

 

InfoQ: Could you go a little bit into detail concerning the event bus? Do I have to think of it as a kind of Java Messaging System with topics and queues?

Eberhard: It provides publish/subscribe as well as point-to-point communication. It can also be distributed across multiple nodes. However, events are transient. They are not stored on any kind of stable storage. Therefore, events might be lost. This is different from JMS that put the emphasis on reliable while the vert.x event bus puts the emphasis on performance.

 

InfoQ: How independent are those modules from each other? Can you really choose one of the components and exchange or upgrade it during production? Are the messages for my module queued until it's back up and running?

Eberhard: The modules can be exchanged or upgrade in production. But the events are not persisted. So the events would be dropped as long as the reveiver is down.

 

InfoQ: What does a system landscape look like that is used in this environment? After all, I can't say that I want to be high performing and high scalable on one side when on the other side I'm integrating systems that won't fit in these concepts.

Eberhard: Well, that's another advantage of vert.x: You're able to define so called worker verticles. Worker verticles use a thread pool, so you can integrate legacy code that needs to do blocking IO. The thread pool is separate from the primary event loop. So the event loop will not be blocked by a worker verticle. Then a worker can do whatever task it has to do - synchronous, blocking IO or whatever. This feature is actually quite important since there are a lot of libraries in the Java space that are not designed to be non-blocking.

 

InfoQ: How good can this be assessed in advance when you are designing a new system? I once read an interview with Ryan Dahl, the creator of Node.js, where he stated that choosing JavaScript was a good idea, because there were only a few modules around. But "only a few modules" also meant, that there can't be too much wrong modules from a conceptual point of view - modules that are not designed to be used asynchronously. I think if you would just choose an arbitrary Java library, I would probably run into troubles quite soon, wouldn't you?

Eberhard: Exactly. But that is where worker verticles come in. I understand the point Node.js makes. Simply because JavaScript programmers are used to work with asynchronous concepts. They have been doing asynchronous programming with AJAX for ages. There are - or at least have been - voices within the Java community that say "we understand how to build large systems and JavaScript people just build simple frontends". But with AJAX, the JavaScript guys always had to deal with callbacks and async concepts and it's quite natural to them. Therefore, it's also quite natural to build something like Node.js based on JavaScript, now that we have high performing JavaScript environments. vert.x shows this, too: Looking at the vert.x examples, in Java there are anonymous inner classes instead of functions in JavaScript. That just lead to so much more code. In my opinion, that's one of the classical design errors in the Java language. The idea of anonymous inner classes was just bad. But the problem will be solved with Java 8.

 

InfoQ: You mentioned that the Java community tells the JavaScript programmers to "go and build front-ends". Is there a little disadvantage on the side of Node.js because there are a lot of people who are able to program in JavaScript but the systems one is able to build on Node.js actually require a different conceptual background? Do we run the risk of seeing people designing systems without having proper enterprise knowledge because of these new languages that enter the enterprise market?

Eberhard: Let me put it this way: The Java community traditionally is building back-end systems and is used to be working on large systems. A lot of the ideas for designing large systems emerged from the Java community. But I wouldn't judge a developer by the language he happens to be using. And regarding asynchronous concepts - they are simply in the blood of JavaScript developers. For large systems modularization is an important topic. vert.x is providing an interesting alternative here. With proper modularization, you can cut a large systems down to smaller pieces. And even more interesting, the concept of modularization is message based, which provides additional advantages. You will achieve looser coupling since the systems only depend on messages but not on calling specific methods.

 

InfoQ: Right. And it's even more facilitating that by using JSON we have a message format that is quite "loose" per se and helps to avoid compatibility problems.

Eberhard: That's true. But JSON is only one of the technical possibilities. JSON does not offer any kind of schema defintion. Actually, some say that schemata are bad because they make things inflexible. They claim you end up in a waterfall model, where you design interfaces as a first step and build your system based on these afterwards. Nevertheless, I think it would be nice, if you could check whether JSON messages are really structured the way I expect them to be. And that's where I recognize a problem with JSON. Usually, you want some kind of contract which defines the expected structure of the data. And I think, you can benefit from XML schemata in such situations.

 

InfoQ: Actually, I had quite a funny experience with dynamic data-types a few months ago. I hosted a training where we coded some kind of Twitter clone, based on PhoneGap, Node.js and MongoDB. At one point, the front-end guys had the idea, to add images to their "tweets". Those images were transferred through the Node server into the database and later back to the front-end. When the front-end team showed off this feature, the node and database teams were quite surprised, since they did not recognize that they were handling images now. That was really cool in this situation, but I think, this can also lead to major problems.

Eberhard: That's exactly the advantage of flexible schemata. But if two or more teams are involved in development you usually want to have some kind of interface contract. To enforce that contract, you need some kind of schema. What you described shows another interesting point: Where do you have to interprete data? In the stack you mentioned, you are simply not interested in checking or verifying the data in most of the system. You just want to store JSON documents and that's it. What's inside the document is not relevant because you don't need to interprete the semantics of the data. Maybe that's different in enterprise systems. If you say "that's a customer", then you need to know, what the data of a customer looks like.

 

InfoQ: Well, there is a little bit of JavaScript or dynamic typing ideology behind that. When it walks and swims and quaks like a duck, then I call it a duck. If it looks like the type that I expect and it contains all the data I need, then I don't care about the rest.

Eberhard: True. But there are issues. When you get a customer object and you expect it to have a date of birth, then you can define that in the schema. That is where XML schemata excel, because they have a highly sophisticated type system. So you can even define what an order number looks like as a regular expression. And if somebody violates the schema, he might run into troubles. The schema tells him his data is considered invalid in advance. He can use that as an early warning.

 

InfoQ: Is it always JSON, that is used on the vert.x event bus?

Eberhard: It's possible to use arbitrary objects. However, if you use Java objects, you would run into serialization and classloading troubles. That's why JSON is the preferred way in this context and it's the way it's usually done.

 

InfoQ: Regarding technologies like Node.js or vert.x - would you say that architectures for large applications are based on those technologies or would you rather say that an architecture just uses those technologies for certain areas? As far as I can remember, LinkedIn handles traffic for mobile devices via Node.js but the application is per se a "normal" enterprise application.

Eberhard: At the moment, I think it might take a while until those technologies are used broadly in classic enterprise systems. I can see a difference to Spring at this point: It was clear how Spring fits into enterprise systems. There was a clear migration path. It is somewhat different for vert.x. vert.x can be embedded in other applications. But to get its full power you need to use its runtime environment. This environment is quite different from traditional enterprise Java. There's just a Java process and no servlet or application container. However, there is a trend towards asynchronous systems. There are Erlang, Scala and Akka and frameworks like Spring Integration and Apache Camel - each one coming with different approaches and a different focus. Spring Integration or Apache Camel for instance provide various adapters, send asynchronous messages and process data. So they provide an integration solution - just as shown in the book "Enterprise Integration Patterns". The idea of Erlang was to achieve high performance and reliability. vert.x is similar. So working asynchronously is a specific way to build systems, that is getting more and more important.

 

InfoQ: What do you think, one should not try to realize with such technologies? Where are the pitfalls that I would hit?

Eberhard: The question is, in what context are those solutions particular useful. I think the sweet spot is building high performance systems, especially with a huge number of clients. Or integration scenarios and other scenarios that benefit from the loose coupling. To put it another way: If you want to build an ordinary web application, it might not be the best solution.

 

InfoQ: How can I use existing frameworks in vert.x? Can I just choose any existing framework and for instance use some front-end framework like Apache Wicket to deliver web-pages?

Eberhard: Unfortunately it's not that easy because most frontend frameworks base on the servlet API. The servlet API is blocking and consequently not compatible with vert.x. Thus, you would design a solution using a templating engine and take care of delivering HTML pages yourself. But a more typical use case would be to build backend systems using JSON and REST interfaces for JavaScript front-ends. You end up having more logic in your front-end and less of it in the back-end. Not necessarily less business logic but for example logic for rendering HTML pages.

 

InfoQ: You mentioned production environments - do you think there will be more use of cloud based szenarios in this context? Or is there actually not too much of a difference, but our operations teams have to adopt the new paradigms?

Eberhard: It depends on want you exactly mean by "cloud". The problem with most of the PaaS clouds is, that they just offer the servlet API and thus you face the situation we were talking about before. It's no problem to deploy vert.x applications on a IaaS, but I don't see a huge benefit for vert.x over other Java technologies in this area.

 

InfoQ: Let's talk a little bit about support during development time. If you have Node.js in mind, you can find IDEs that support JavaScript better or worse - but all in all, it still seems to be far from what we are used when developing Java software. Regarding QA, it seems to be the same: there is some improvement, but it's still not very much. What's the situation for vert.x here?

Eberhard: What is great about vert.x is that you can use hot redeploy mechanisms. That makes development a lot easier. And you have the advantage that vert.x compiles and executes any source file you provide. There is no Eclipse plug-in, however.

 

InfoQ: Besides that, I suppose you can use everything that is present for the featured languages. For example infrastructure for testing or automatic code reviews for the parts you decided to use Java?

Eberhard: You can do this, yes. But as I said before, the concepts for modularization and deployment are different, so you're facing some challenges here.

 

InfoQ: What about debugging in vert.x? Is it possible to just attach to the running process?

Eberhard: Sure, that feature is provided by the JVM. Using the JVM gives quite a few benefits, actually. Code is executed on a highly optimized virtual machine that executes bytecode. And I think it's easier to create an optimized VM for bytecode than it is for JavaScript for example. What the V8 team has built is astonishing and very, very exciting. But there was a huge amount of engineering that was invested in the JVM. And I think it's a good idea to count on it and use it. There are measurements which state that vert.x is faster than Node.js by whatever factor, but like every other benchmark, that's hard to interpret.

 

InfoQ: Yes, that's true. But it actually sounds quite logical. On the one hand, what the V8 team created in the past few years is really great. But on the other hand, we have about 20 years of JVM development. And you have to say, that there are some concepts the V8 engine relies on to be performant. But if I do not know these concepts, I can write syntactically correct code, which is not bad per se, but contradicts those internal concepts resulting in low performance. I think that this can't happen in Java in the same way. I surely can write bad code with stupid loops, but that's another type of error.

Eberhard: But this is a little bit like comparing apples with oranges. Actually, I would have to compare to whole stack - Rhino and the JVM for JavaScript on vert.x and V8 for Node.js. Support for dynamic languages hasn't been in the JVM from the beginning. Another thing is, that you can use multiple event loops in vert.x, usually one for each CPU core. That seems not to be the case with a V8. Thus, if I have an eight core server, I need to start multiple Node servers, while I would only start one JVM with eight event loops for vert.x. That's different and possibly more efficient. But as I said before, benchmarking is a complex topic.

 

InfoQ: Coming to an end and as a sum up... what are the main benefits of vert.x? You can't hide your interest for this topic.

Eberhard: Well, there are several reasons why I think it's an exciting technology and I consider it to be important. First, there is asynchronous IO which will matter for the JVM in the future. Then there's a better concept for modularization which I think enterprise Java lacks. This is a problem for a considerable number of projects and it is hard to come up with a good solution. And lastly I think it's relevant because in my opinion, the JVM as polyglot VM will get more important. I don't think that the Java language alone is necessarily the future of the JVM. vert.x helps because it is true polyglot. The challenge is, that for its full power servlet and application containers have to be exchanged for the vert.x runtime.

 

 

Eberhard Wolff is founding member of the Java Champions, author of several articles and books and also a regular speaker at international conferences. His background is in Enterprise Java, Spring, Cloud and NoSQL. He is working as the architecture and technology manager for adesso AG in Berlin, Germany.

Hello stranger!

You need to Register an InfoQ account or to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

THX and...what about memory footprint? by stefano fago

THX!... Very interesting interview... i report another resource [blog.andrewvc.com/vertx-node-on-ropes] and a little question about the fact that nodejs can be less memory consuming than vert.x, changing use-case point of view, so it is choosen in process-oriented architecture (for example Wooga) than thread-oriented ones... What do you think about it?

Re: THX and...what about memory footprint? by Ralph Winzinger

Hi Stefano,

interesting point ... basically, I also think that Node.js could be more memory efficient. But given the fact, that those solutions are designed to handle many, many thousand parallel connections, I don't think that there's too much of a difference in the long run. Do think of special scenarios where Node would be better?

Bye, Ralph

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

2 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2013 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT