BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Hardware friendly, high performance Java-Applications

Hardware friendly, high performance Java-Applications

Bookmarks
   

1. I’m Michael Hunger from InfoQ and I am sitting here with Martin Thompson and Dave Farley from LMAX speaking about trading platform performance, mechanical sympathy, the Disruptor pattern - lots of things to talk about. Welcome to this interview and perhaps you can say something about your company, probably a personal background of it and kind of what was the reason for coming up with something like the Disruptor.

David Farley: I’ll start with some of that. So, LMAX is a spin off from Betfair, the world’s biggest sports betting company and what we’re trying to do is that we’re trying to bring exchange technology to a retail and finance marketplace, which is unusual - most systems haven’t worked up until now. That brings an interesting combination of problems unusual in that we have to worry both about the traditional high-frequency low-latency dimension of performance in being able to satisfy the needs of market makers in algorithmic trading because as a low-latency fundamental measure of the success of the narrowness of the spread in a marketplace. And we also have to worry about the problem of being able to scale up kind of the classic internet problem. We hope of having tens of thousands maybe hundreds of thousands of users accessing this and managing their accounts and so on. So, it’s an interesting confluence of the problem.

My own background, Martin and I worked together many years ago as consultants. More recently I worked for ThoughtWorks and we started the technology department of LMAX together four or five years ago and we’ve been working on this project since then.

Martin Thompson: Yes, we kicked that off five years ago with inside Betfair to offer a new type of product to retail customers. At the moment, they had to go through brokers, so we’re kind of classic disintermediation player whereby we’ve got wholesale market makers offering their products and prices directly to retail. So, we’re there, we got no opinion on it so you can trade the prices you see and the prices you get and we offer fast quality execution and try to bring this new player to the market.

   

2. So the challenges, let’s speak about what kind of technology approaches that you take or try out to get there. So obviously you’re almost there or you’re already there, being able to scale up and also being low-latency high throughput. What did you do to get there? What was your approach?

David Farley: We took a very iterative approach. We’ve actually tried and failed several times during the course of the history of the company and tried and succeeded once. So, we started off trying to do similar ideas to many people that kind of play and in the reasonably high performance computing space. We started off with stage event driven architectures and tried to distribute the behavior between multiple threads and it wasn’t as fast as we wanted or needed it to be.

We measured that and we found that we spent more time trying to figure out where to put work then actually doing work. And so at that point, we kind of had a light bulb moment that went off and started thinking about trying to tackle this problem somewhat differently – a key feature of our system is that the system of record is in-memory and operates on a single thread.

So, it’s very simple in that context but it does give us lots of properties or problems that we have to worry about but also positive properties in terms of being able to have a completely deterministic system. So, we have an event source system where we have stream events coming in and get processed by these services that store the state in-memory and process the operations as they come in. And it does this at phenomenally high rates and the program is really simple. So, we kind of got lots of wins once we got to the right solution.

But an important part of the approach for me, personally, I think, or us as a team, is we try to apply a very scientific approach in terms of measuring the real outcomes and not depending on just what we’ve been told or what everybody else believes to be the case and measuring things and finding out for ourselves what really works and changing things where they didn’t work.

Martin Thompson: I think that’s been so important because a lot of the research out there is all about throughput and people generally get throughput by going parallel and throwing lots of hardware to the problem. We discovered that if we hadn’t had problem with throughput we could do phenomenal rates. We couldn’t get predictable and low-latency and it’s the space that’s not very talked about at the moment so you can’t just go pick up a normal tool. We had to use a scientific approach to work at how do we eradicate latency from our system.

So we're looking at the things that were causing pauses, looking at things where queues would be forming, where steps in a process and pipelines were building up and by looking at that, looking at what alternatives we have, coming up with experiments to test new techniques eventually led us into the technologies we use today. I think that’s how we ended up in a very different place because we have different drivers. Most people have got systems where latency is measured in human response time; in the finance place, latency is measured at speed of light in some cases between two points, so you’re dealing in microseconds these days and even lower in some cases and for many things you’re measuring and not in human reaction time.

   

3. I was quite surprised when I learned that your platform is running on Java because it was always told that JVM was slow and Java is slow. What were the kinds of discoveries there and how did you get there? So what kind of factors did you look at to achieve that?

Martin Thompson: I think it’s the same thing I’ve been talking about is Java is actually quite a fast language which, if used correctly, it can give phenomenal throughput even compared side by side with C++. There is a few percentage points in it these days for most basic algorithms. The latency is the hard problem to address, for example, if you’re dealing with contended locks, it doesn’t matter what language you’re using, you’re getting the operating system involved at that stage and once the operating system gets involved, you’re poluting caches you slow down the response time and you won’t get predictable response anymore at this stage. You make it a lot more throughput in some cases but actually as the locks get contended, you then start losing throughput as well as the latency. So, you got to move it all up to the user space side of kernel when using techniques at that level.

At that stage, Java can perform just as well as almost any application. I think the one thing you got to watch out for with Java is using lots and lots of objects can cause garbage collection so you got to keep garbage collection well controlled so allocate the minimum of objects and make sure you tune your systems so that they are collected efficiently at the end of it.

David Farley: I think like any other choice, there are kinds of pluses and minuses so choosing a language like Java, as Martin says, you are binding to the problem of garbage collection that needs to be in your mind as part of your approach and design. You need to think about writing Java in a way that minimizes the garbage collection problem.

Again one of our themes is the application of mechanical sympathy underlying a little bit of what’s going on underneath the covers and to influence the choices that you make in design. A good example of that is the generational garbage collector in Java at least the standard one and understanding how that works if that’s the one that you’re using. And tuning it to operate is an important part of being able to get Java work effectively. But there are some pluses as well: so, compiler technology has moved down a lot.

Runtime environments like Java and the CLR and so on - they have more opportunities to optimize. Something like a C++ compiler has one shot at optimizing at compile time and you don’t know if it’s fast enough or not.

Something like a hotspot compiler can observe the behavior of your application as it runs and start tuning it to work better. We just finished the talk here at GOTO Conference and one of the points that Martin made that I kind of like is that one of the things that you do need to worry about with Java is that a lot of Java programmers have started to forget what’s going on underneath the covers and write code in very wasteful ways.

One of the advantages like I think with languages like C++ is probably the community’s a little bit more hardcore and they think about these things more naturally. So, we’ve built a very good theme and we’ve kind of grown our ability to think about these things over the years that we’ve been doing it and that’s an important facet, too.

   

4. So it’s about making informed choices. So, know what you’re dealing with from the hardware platform, from the JVM level, garbage collection, from the libraries used or don’t use, choose not to use probably and also as you said in your presentation from your domain model. So, the quality of your domain model also influences how well application performs.

David Farley: That’s a particular thing I care a lot about, is that in my past life working as a consultant going into organizations, it’s a common complaint with the performance of the systems run poor or something like that. And many of the systems I would look at in those days, they were complicated and hard because they were high-performance systems. I think the inverse is true. Genuinely high-performance systems need to be simple in order for a systems to be high-performance, it needs to be optimal, and it needs to be doing the minimum amount of work for the maximum outcome, whatever that may be.

And if it’s doing the minimum amounts of work, that means that the algorithm need to be simple, the code needs to be straightforward; there needs to be fewer places for performance bottlenecks to hide – so keeping everything simple and straightforward, it’s also one of the hardest things for us to ever learn to kind of establish as part of our team culture. If we come to a problem and it feels like it’s hard, we’re probably thinking about it in the wrong way. We probably need to think more and think about the design to try and look for that simpler, more elegant solution and that’s a real win in terms of performance design.

Martin Thompson: I think we have a tendency to overcomplicate things but mostly driven by having a very insular view of a problem. I call up and see people who look at the problem and then look at the two layers and interfere just on either side of the problem rather than looking at the problem overall where you may be taking in an order and you try to send back a trade report.

I say it doesn’t matter what happens in there. There’s nothing in that specification system that must be converted to XML on the way. Our industry seems to have sort of gone that way where people are translating things between layers and frameworks.

One of the things I like to encourage people is that whenever you pick up the framework or a library, ask “Does it pay for itself?” If, by adopting this framework, it gives you advantage, great, then use it. If you find this actually adding in a lot of cost to what you’re doing just because everybody else does it, don’t be a sheep and follow the herd. Ask the question, “Does it give you value?” and don’t use it if it doesn’t give you value. And be comfortable in that rather than just following.

   

5. Sometimes it’s even adds a late cost that you don’t realize upfront and you have to be so brave to cut it out again. So, it’s about cutting down all the additional complexity, all the auxiliary complexity to kind of the core of the problem also. As you also have seen quite often in the business systems that people kind of anticipate future uses and future use cases and extensibility and all the other stuff that’s also Greg Young talked about at the conference and forget to focus on core domain. It’s probably also much easier if you have domain to kind of a very core domain and limited domain with just a few use cases and focus on them and just ignore all the corner cases and sidetracks that you could run into. I don’t know how was that was for you?

David Farley: I don’t know if it is easier in that domain that something that people have said to us a few times. There are certainly classes of problem that wouldn’t suit the architecture choices that we made for solving our problem. However, I think that the approach that we’ve taken out; so, our system as I said is very stateful, the system of record of the principal components of our system is the in-memory state of a rich object-oriented domain model, that’s why we serve the answers from and that’s where the order books are for matching our accounts of high order and so on.

Clearly, there are some problems that wouldn’t fit into that but hardware is phenomenal and the amount of memory that you need that you can put into a modern server is enormous. My first computer had 1K of memory in an 8-bit processor. So the amount of stuff that we can store is enormous, the amount of stuff that it can process is absolutely phenomenal and we often throw those advantages away by very poor uses of the technology.

And I think this is applicable to a very broad suite of problems and the only sorts of thing that rule out this sort of approach as far as I can see are when we have genuinely very large data sets. Apart from that, if your working set is anything reasonable, you can probably do it in memory faster, simpler and better than if you go through many architectural layers and dealing with the technological hurdles that are imposed upon you because of that.

Martin Thompson: I think the memory question it’s not just the total memory, it’s the working set that matters. In so many applications, the working set is quite small and you may have a very large historical amount of data but the working set tends to be a very manageable amount until you can always stage out of memory to some other form of storage. And if then if you’re doing a very large batch jobs, when you can stream that data back from the other storage, work on it and stream back results again. And working in-memory is a very efficient way. It’s a very nice programming model, we don’t have to go through these layers and layers to work with data in a nice, friendly manner.

   

6. Each layer adds additional latency to your system. Just as you said, in principle, a single business processor running on a single thread. And as you said, streaming input-output is also kind of a keyword. You skipped the queuing approach that people normally use for getting messages in from some external source and sending messages out to another source. So you developed a pattern to work with that. Could you just quickly outline it so we don’t have to go into the details because it’s all open-source and you've written about it?

Martin Thompson: So, we’ve open-sourced the Disruptor project. It’s kind of interesting: we have a lot of problems with queues in the system fundamentally because queues are contended, they don’t respect the single writer principle when you have multiple threads or any sort of execution context accessing the same data or resource. You have to manage that contention, introduce locks, introduce CAS operations, whatever to manage that and that causes latency, it restricts your throughput and greatly increases complexity.

Queues, by their very design, have this built in. So, we try to evolve the design. We didn’t just try to dream of a brand new design, we looked at: “What are the issues? How could we eliminate those issues?” And step-by-step we eliminated those until we ended up with a design that didn’t suffer from the same issues as queues. That happens to result in the Disruptor but the nice property of it is it’s phenomenally much greater throughput and that’s not an interesting point especially in finances, it’s three orders of magnitude lower latency and that’s what matters and it really has very predictable latency especially under burst conditions.

Most other structures, when you get burst traffic, the latency then increases; with the Disruptor, latency actually stays completely flat until you saturate the hardware at that level. So, it gives really nice characteristics in how it batches up traffic going through it. Why is it out there as open-source? There are two really simple reasons for that: one, is we’re trying to recruit people so quick plug on that front and it’s a really good way getting people to see what we’re doing but also because we’re a venue at the heart of trading, we want to stimulate this environment.

A lot of people trade against our APIs and have to build this infrastructure up from scratch and then we got to create trade ideas but they don’t have all of the skills and infrastructure they can start going down that route but it takes a lot of their time. So, we want to give all the different party, we have an API to our exchange and we’ve given out to C#, Java, PHP, that’s extending all of the time. So we can offer that out to make trading against our venue easy then we’ll offer things like the Disruptor so people can build their handling of I/O and their handoff between different threads of execution in a very fast and efficient manner.

David Farley: Yes and as Martin said, it’s an important point, I think, to my mind that we didn’t set out to design this Disruptor. Like everybody else, we started out using queues to separate our processing and we measured it. It didn’t work, it wasn’t fast enough. We were seeing problems, we weren’t even close to what we knew to be the theoretical limits of the hardware and so we went through this evolutionary process.

The only thing I’d like to say just because there are some of the feedback occasionally gets on the surface, we recognize that this isn’t a brand new thing and what this is, is a confluence of ideas that have been around in computer science and technology for, in some cases, decades – we’ve put them together in a way that makes them efficient with modern hardware.

As far as we’ve seen with the Disruptor technology, it’s getting close to the theoretical limits of the hardware and that’s unusual because of the levels of abstraction that we were talking about earlier that we tend to stack on top of the hardware. We don’t very often get to those sort of levels of performance which is interesting because it stresses also other parts of the system at times and kind of digging into some of these things, it’s interesting but it’s been an interesting process to go through to evolve this.

   

7. When open-sourcing that, what was the feedback from the community? How do people react to it? And did you see any kind of interesting uses of the Disruptor out in the field?

Martin Thompson: Lot's of. I think it’s been a great eye-opening exercise because I think we benefited as much from it as from what we’ve given it to the community.

So I think that’s one of those things, that you have to give, to get. It' great. We have feedback that it’s greatly improved the framework need us to think about things in different ways because we only use our own use cases against it, we get quite a few but it’s still a limited set.

We see other people with other used cases and it’s very interesting and then they give feedback. And there’s some really talented people out there that have given this feedback ported to other languages and so they fed back on how they dealt with it on their own platform. So, early on the good one was the acts on framework, they were talking about how they were doing just over 200,000 messages per second or commands per second through their framework after putting the Disruptor in some tuning that went up to 1.2 million – it’s so nice to see those step changes and so often we’ve been talking about, “Oh, the 10% improvement here and 10% improvement there” – to have a step change and go up in an order of magnitude. It’s really nice to see.

This is possible with this modern hardware has got so fast but we’re nowhere near the theoretical limit of the hardware drive in it with our software. And David has mentioned about our sort of scientific approach and like my view on the mechanical sympathy is, if you know what the hardware is capable of and your software is getting nowhere near that performance, there’s something wrong in there, you’ve got something wrong in your abstraction layers. And so, if you measure the latency between core on the CPU or the bandwidth you can move things between it. If your software is getting a small fraction of that, there’s something very inefficient in what’s going on.

David Farley: It’s one of the things that, if you stop and think about it (which I confess like everybody else, I hadn’t done for a long time until I started working on this project), the levels of inefficiency that we accept as normal in software are staggering. They’re inconceivable in almost any of this sphere of human activity. We will take hundreds of thousands of times less performance, inefficiencies in some things is what we’re seeing.

Imagine if a car was a hundred thousand times less efficient, one car is a hundred thousand times less efficient and it’s important in a wide variety of ways. I don’t know what the real numbers are. I read somewhere some ridiculous number about the percentage of carbon that data centers are responsible for – just that level, if it’s true that it’s worth some significant fraction of all the carbon that’s emitted on the planet because of the data centers. If you can make your software even ten times more efficient, it means ten times less hardware, it means ten times less carbon that’s being pumped into the atmosphere and there’s all of those sorts of things and I’m not suggesting that the Disruptor will save the planet by any stretch of the imagination, but just thinking in terms of sensible use and not ignoring those things is important.

And I was one of the people that for many years that would go at consultants and say “The important thing is not to worry too much about the performance and to get the design right.” I don’t believe that any more. I think that you need to be worrying about performance and thinking about it as part of the daily activity of writing code.

Martin Thompson: I think you come out of the other side with a different mindset as well. If you strive for performance and by that I don’t mean lots of bit shifting and doing weird code until you write a really clean and simple elegant code that correctly models business demand, it’s very performant but also it’s very simple and elegant but really easy to fix, it’s really easy to evolve – this striving for efficient code doesn’t mean it takes much longer and you end up with a really complex solution.

I have people to work with me on projects and they get used to this way of working. When they go elsewhere and then go, “I feel like I’m broken. I want to get back to that way of working again.” You know, our industries all say, “No, performance is all for weird people, it’s all really strange,” but instead putting in three layers of a framework and are trying to comprehend what that framework is doing, is far more complicated.

   

8. So it’s mostly clean code principles? And being able to make that informed discussions and decisions about how your code affects hardware and it’s not too hard. It’s not like rocket science to understand how a processor cache or striding or prefetching works, or branch prediction. It is kind of in the scope a normal developer can grasp, right?

Martin Thompson: The really important discussions, like, if you get business problems, the scientific problems and other problems are different but most business problems can benefit people, stuff and deals – that’s all it is. We have people who buy things and there are things that are sold and deals are done and in that process. The modeling of the relationships in those is far more important than we are about any framework, if you understand the characteristics.

What’s the characteristic or how many deals does a typical person do? How many deals are done in a day? And so to understand in the cardinality, how do those relationships work and the right data structure should be chosen for walking those relationships? These are one of the biggest wins in performance. You can have micro-benchmark shootouts but I find it over and over again the algorithms of people choose much further up the stack, matter much more. We should be spending our days understanding the model, that’s a discussion with the business. How many of these things happen in a day? How do you want to use this data in the future? What do you need to know from this data right now and designing models to do that? It’s very performant, it’s the right thing to be doing for the business.

   

9. In the end it’s a whole approach, right? You can’t just look at one layer of the application you just can’t look at the mechanical sympathy you also have to worry about the clean code in between the clean domain model and also about getting the actual real requirements from the business people. So, looking close at your extra traders in your case and like, looking together on this. It’s like a whole approach.

Martin Thompson: One of the most amazing failures of our industry is we create code that is not a representation of the real business domain. It’s an approximation for it. How many times do you look at the code and there is not a class named anything like what the business people talk about and see the difference. So what happened there is you need a mapping. Is the real business domain, you’re dealing with? And is your code and because the code is not an exact representation of the business domain, its approximation needs to be mapped. Where does that mapping live? It’s in people’s heads.

So, straight away you’ve got problems – if you lose the people, you’ve got problems of sharing and we’re all fallible individuals. Quite often I can’t remember exactly what I did last week and it kind of dig it up again. So, what’s the best way of documenting what’s going on? Put it in the code.

   

10. I am also very keen on the notion that software development is actually just a learning process and the software that is created in the end is kind of the materialization of what we’ve learnt, of the knowledge that we’ve achieved and it should be in the vocabulary of the domain we’re been working with. There also shouldn’t be any media breaks or whatever in between to understand that. And even so, you should be able to show your code to someone from the domain and from their names and operations and they should at least have a grasp by understanding what’s happening in this piece of code, right? They don’t have to get all the details but that’s very important, I think.

Martin Thompson: My mental model of it is it’s a specification of my current understanding and that applies to the hardware and the businessmen, so if you drive the hardware correctly, if your code specifies how the hardware works and drives, you can get amazing performance, that’s kind of mechanical sympathy. So it’s exactly the same for the business world. And so we’ve learned all the time, I love the expression but it’s learning so it’s the specification of what we know at this current point in time; tomorrow we learn more, we change the code to reflect what we’ve learned. Don’t have it evolve often in different directions - keep it as a living, breathing thing that’s your current understanding.

David Farley: Yes, sometimes there’s a core of fairly straightforward basic practices that’s a real value. And as an industry, often we think that the value is somewhere else. A lot of the problems as I see all of our software doesn’t look like that when you have these rich domain models that are simulations of the business problem. It is because we conflate concerns of the technology so, yes, it would be really nice to have this customer object or order of it but we got to store it in a database, so we’ll put the code for storing the database in the same object.

And we don’t insulate and protect ourselves but that may sound counter to what we're talking about abstraction but there are separate domains and some of those domains are technological, as Martin is describing. And so the Disruptor stuff is working in the domain of the hardware. It’s designed to worry about the underlying hardware and to work effectively with it – our business logic is an abstract simulation of the business problem and knows nothing about the hardware that it runs on and you could move it often in a completely different environment and it would still be the same code and it wouldn’t be any line that changed. That sort of stuff and sort of focus on making sure that you’re working in the correct place and not bundling together concerns which is just good software development which we’ve all known for many decades.

Martin Thompson: But often ignored and forgotten.

   

11. While working with the Disruptor pattern, I was thinking about the Disruptor as actually a collection of patterns, it’s not just a single but a pattern system because you have all this different concerns you address inside the Disruptor. Did you ever think about pulling those concerns out and giving them names too, having patterns for those?

Martin Thompson: We pretty much have done that. For example, if you want to exchange between two threads you want to claim the memory you are going to exchange, you want to publish it across. We have a claim strategy. When you’re on the other side and you’re waiting for an event to become available, we have a wait strategy and there’s different variants on that. So for example, on the claims side, you can be a single threaded claim strategy or multi-threaded claim strategy so you can get the ultimate performance if you know your interaction pattern. Same on the other side, if you are waiting, when you can wait and busy spin and burn CPU you get the absolute maximum out of the CPU for the lowest possible latency; or you can take different strategies, you can back off, you can yield or sleep or even lock and wait for that- but you get the choice.

And by teasing out the concerns, you can provide the choice and it’s also clear how they all work up on it. A lot of the performance actually come from teasing apart the concerns because as soon as you conflate your concerns together, the complexity goes up as the product of the concerns involved and that also then impacts the performance and the way you can tune and deal with it.

   

12. Disruptor 2.0 was released at the end of August, right? And there has been lots of other stuff happening around the projects so you’ve actually got a Duke Choice Award at JavaOne. Last year there was this presentation at QCon San Francisco and lots of things going on. So what do you think, what’s the future of your approach, of your project? Will it be like also open-sourcing other parts of the architecture that would be useful for your consumers of your APIs? Do you want to evolve the Disruptor without getting it to become a too complex framework because that’s kind of difficult?

David Farley: As an organization, we are very much in favor of open-source software and want to share with the community and as Martin says, “Get back from the community in return”. The Disruptor has been the most successful, we’ve already open-sourced a few other things, other various things, most of those are more around some of that continuous delivery process and that software development process that we are using. So there is a software package called Freud, for doing analysis kind of tests, static analysis of your code and there’s a micro-benchmarking open-source project that was a spun off from my project, the guy is no longer with us and so on.

And there’s some bits and pieces but it’s kind of expensive activity actually to set up an open-source project. I’m not quite sure whether we got anything else in our sleeves that would be quite as successful as the Disruptor has been in the immediate future and that in terms of the future of our organization and in our business and the Disruptor panel, on a personal level, Martin and I, hope that people will use the Disruptor.

We think that there are lots of patterns that make writing software simpler if you use this sort of technology and apply some of the lessons that we learnt at LMAX. We hope that happens from LMAX’s point of view where we’re a start-up company and we’re going to push it on adding new features and building a client base.

Martin Thompson: It tends to be very organic and driven by people’s own time. So there is the work that we didn’t work for our needs, but I think that Disruptor 2.0 is a good example of, it’s very much community led. A lot of features that were added in that were because of community asked for them, we didn’t actually need them inside the company. But on reflection, we actually gained from it so we started thinking about different ways of working and now we’re actually able to benefit from some of those things.

I think no one has got the perfect picture of the world. Some of the stuff we thought was quite good, some of the feedback we’ve had and how we’ve been challenged to think about is we’ve discovered there are even better ways of doing stuff. So, these ideas come back in and there’s also, for example, the challenge to remove memory barriers was an interesting one –it’s spun out of dealing with some false sharing issues and we’ve been talking with some of the people and other things, and in dealing with their problem, I just had a lateral thought and that actually I knew enough about the memory model, I knew enough about the algorithms that’s used inside the Disruptor that we don’t need memory barriers and so you get these light bulb moments.

We would have never gotten that without interacting with the community and seeing other people’s problems and then it moves things forward. Also, we’re a huge consumer of open-source – so, just interacting with that. I think it’s one of the biggest shames of our industry as mentioned last night in the keynote is that we don’t see the work of other people so much in any other profession on the planet, you share the work with other people and that’s what brings us forward.

I think it’s one of the reasons why we don’t learn from the history of our subject, not even the published stuff, because it is not in our culture to share and understand and work with other people and we don’t do to build on the shoulders of giants principle of physics, for example. I think it’s one of the things that’s holding us back a lot because this has been such great work in the history of our subject that most people are unaware of.

Jan 05, 2012

BT