High-Volume / Scalable Architectures with vert.x - interview with Eberhard Wolff
InfoQ got in touch with Eberhard Wolff to discuss the differences between those two technologies, the challenges that emerge from building architectures based on them and the benefits that those architectures provide.
Eberhard was formerly principal consultant for SpringSource and is now architecture and technology manager for adesso AG. He has been analyzing and designing enterprise systems for years and has spent the last few months researching the field of vert.x.
InfoQ: Node.js is now about four years old, vert.x about two and it becomes apparent that there is a certain need for those technologies and concepts. What is your point of view regarding Node.js or vert.x? Where is it useful to apply these technologies? Which use cases are they made for?
InfoQ: Speaking of Node.js, I'm actually writing only one application which will be run in the Node.js server. It reacts in a single-threaded way to requests and has the ability to reach down those requests to a thread pool to do some follow-up processing if necessary. That means, that I do not have any modularity or different components as a higher level concept, that would be able to communicate with each other. How's vert.x working here in contrast?
Eberhard: In vert.x an event bus is used by the modules - the various deployment units - to send events to each other. Those events can for example contain JSON objects. The modules in vert.x have separate classloaders, so they can really be seen as separate components. So you can put functionality into different modules. For example, there's a module available on Github for interfacing with MongoDB. It consumes the appropriate events from the event bus and stores the data to the database. Compared to asynchronous programming it looks like a minor feature at first glance but I believe that this is really important. It addresses problems you actually face quite often within enterprise applications.
InfoQ: Could you go a little bit into detail concerning the event bus? Do I have to think of it as a kind of Java Messaging System with topics and queues?
Eberhard: It provides publish/subscribe as well as point-to-point communication. It can also be distributed across multiple nodes. However, events are transient. They are not stored on any kind of stable storage. Therefore, events might be lost. This is different from JMS that put the emphasis on reliable while the vert.x event bus puts the emphasis on performance.
InfoQ: How independent are those modules from each other? Can you really choose one of the components and exchange or upgrade it during production? Are the messages for my module queued until it's back up and running?
Eberhard: The modules can be exchanged or upgrade in production. But the events are not persisted. So the events would be dropped as long as the reveiver is down.
InfoQ: What does a system landscape look like that is used in this environment? After all, I can't say that I want to be high performing and high scalable on one side when on the other side I'm integrating systems that won't fit in these concepts.
Eberhard: Well, that's another advantage of vert.x: You're able to define so called worker verticles. Worker verticles use a thread pool, so you can integrate legacy code that needs to do blocking IO. The thread pool is separate from the primary event loop. So the event loop will not be blocked by a worker verticle. Then a worker can do whatever task it has to do - synchronous, blocking IO or whatever. This feature is actually quite important since there are a lot of libraries in the Java space that are not designed to be non-blocking.
InfoQ: Right. And it's even more facilitating that by using JSON we have a message format that is quite "loose" per se and helps to avoid compatibility problems.
Eberhard: That's true. But JSON is only one of the technical possibilities. JSON does not offer any kind of schema defintion. Actually, some say that schemata are bad because they make things inflexible. They claim you end up in a waterfall model, where you design interfaces as a first step and build your system based on these afterwards. Nevertheless, I think it would be nice, if you could check whether JSON messages are really structured the way I expect them to be. And that's where I recognize a problem with JSON. Usually, you want some kind of contract which defines the expected structure of the data. And I think, you can benefit from XML schemata in such situations.
InfoQ: Actually, I had quite a funny experience with dynamic data-types a few months ago. I hosted a training where we coded some kind of Twitter clone, based on PhoneGap, Node.js and MongoDB. At one point, the front-end guys had the idea, to add images to their "tweets". Those images were transferred through the Node server into the database and later back to the front-end. When the front-end team showed off this feature, the node and database teams were quite surprised, since they did not recognize that they were handling images now. That was really cool in this situation, but I think, this can also lead to major problems.
Eberhard: That's exactly the advantage of flexible schemata. But if two or more teams are involved in development you usually want to have some kind of interface contract. To enforce that contract, you need some kind of schema. What you described shows another interesting point: Where do you have to interprete data? In the stack you mentioned, you are simply not interested in checking or verifying the data in most of the system. You just want to store JSON documents and that's it. What's inside the document is not relevant because you don't need to interprete the semantics of the data. Maybe that's different in enterprise systems. If you say "that's a customer", then you need to know, what the data of a customer looks like.
Eberhard: True. But there are issues. When you get a customer object and you expect it to have a date of birth, then you can define that in the schema. That is where XML schemata excel, because they have a highly sophisticated type system. So you can even define what an order number looks like as a regular expression. And if somebody violates the schema, he might run into troubles. The schema tells him his data is considered invalid in advance. He can use that as an early warning.
InfoQ: Is it always JSON, that is used on the vert.x event bus?
Eberhard: It's possible to use arbitrary objects. However, if you use Java objects, you would run into serialization and classloading troubles. That's why JSON is the preferred way in this context and it's the way it's usually done.
InfoQ: Regarding technologies like Node.js or vert.x - would you say that architectures for large applications are based on those technologies or would you rather say that an architecture just uses those technologies for certain areas? As far as I can remember, LinkedIn handles traffic for mobile devices via Node.js but the application is per se a "normal" enterprise application.
Eberhard: At the moment, I think it might take a while until those technologies are used broadly in classic enterprise systems. I can see a difference to Spring at this point: It was clear how Spring fits into enterprise systems. There was a clear migration path. It is somewhat different for vert.x. vert.x can be embedded in other applications. But to get its full power you need to use its runtime environment. This environment is quite different from traditional enterprise Java. There's just a Java process and no servlet or application container. However, there is a trend towards asynchronous systems. There are Erlang, Scala and Akka and frameworks like Spring Integration and Apache Camel - each one coming with different approaches and a different focus. Spring Integration or Apache Camel for instance provide various adapters, send asynchronous messages and process data. So they provide an integration solution - just as shown in the book "Enterprise Integration Patterns". The idea of Erlang was to achieve high performance and reliability. vert.x is similar. So working asynchronously is a specific way to build systems, that is getting more and more important.
InfoQ: What do you think, one should not try to realize with such technologies? Where are the pitfalls that I would hit?
Eberhard: The question is, in what context are those solutions particular useful. I think the sweet spot is building high performance systems, especially with a huge number of clients. Or integration scenarios and other scenarios that benefit from the loose coupling. To put it another way: If you want to build an ordinary web application, it might not be the best solution.
InfoQ: How can I use existing frameworks in vert.x? Can I just choose any existing framework and for instance use some front-end framework like Apache Wicket to deliver web-pages?
InfoQ: You mentioned production environments - do you think there will be more use of cloud based szenarios in this context? Or is there actually not too much of a difference, but our operations teams have to adopt the new paradigms?
Eberhard: It depends on want you exactly mean by "cloud". The problem with most of the PaaS clouds is, that they just offer the servlet API and thus you face the situation we were talking about before. It's no problem to deploy vert.x applications on a IaaS, but I don't see a huge benefit for vert.x over other Java technologies in this area.
Eberhard: What is great about vert.x is that you can use hot redeploy mechanisms. That makes development a lot easier. And you have the advantage that vert.x compiles and executes any source file you provide. There is no Eclipse plug-in, however.
InfoQ: Besides that, I suppose you can use everything that is present for the featured languages. For example infrastructure for testing or automatic code reviews for the parts you decided to use Java?
Eberhard: You can do this, yes. But as I said before, the concepts for modularization and deployment are different, so you're facing some challenges here.
InfoQ: What about debugging in vert.x? Is it possible to just attach to the running process?
InfoQ: Yes, that's true. But it actually sounds quite logical. On the one hand, what the V8 team created in the past few years is really great. But on the other hand, we have about 20 years of JVM development. And you have to say, that there are some concepts the V8 engine relies on to be performant. But if I do not know these concepts, I can write syntactically correct code, which is not bad per se, but contradicts those internal concepts resulting in low performance. I think that this can't happen in Java in the same way. I surely can write bad code with stupid loops, but that's another type of error.
InfoQ: Coming to an end and as a sum up... what are the main benefits of vert.x? You can't hide your interest for this topic.
Eberhard: Well, there are several reasons why I think it's an exciting technology and I consider it to be important. First, there is asynchronous IO which will matter for the JVM in the future. Then there's a better concept for modularization which I think enterprise Java lacks. This is a problem for a considerable number of projects and it is hard to come up with a good solution. And lastly I think it's relevant because in my opinion, the JVM as polyglot VM will get more important. I don't think that the Java language alone is necessarily the future of the JVM. vert.x helps because it is true polyglot. The challenge is, that for its full power servlet and application containers have to be exchanged for the vert.x runtime.
Eberhard Wolff is founding member of the Java Champions, author of several articles and books and also a regular speaker at international conferences. His background is in Enterprise Java, Spring, Cloud and NoSQL. He is working as the architecture and technology manager for adesso AG in Berlin, Germany.
THX and...what about memory footprint?
Re: THX and...what about memory footprint?
interesting point ... basically, I also think that Node.js could be more memory efficient. But given the fact, that those solutions are designed to handle many, many thousand parallel connections, I don't think that there's too much of a difference in the long run. Do think of special scenarios where Node would be better?