Martin Thompson on Low Latency Coding and Mechanical Sympathy

1. I’m Charles Humble and I’m here at QCon London with Martin Thompson, Martin could you introduce yourself to the InfoQ community please?

Hi Charles, my name is Martin Thompson and I’m an independent consultant and I’m specialized in High Performance Systems. I have plans to want to get rid of throughput, our Lower Latency with their systems. In the past I have been CTO of LMAX, I have been the person who started some of the first internet banks in the UK and I’ve worked in a range of things from finance to the movie industry and media.

2. I guess you came to a lot of people’s attention when you started talking about LMAX’s architecture 2-3 years ago, because the design there seemed very different from how people traditionally thought about writing Low Latency Systems. Could you maybe describe it to us?

How a lot of people develop Low Latency Systems is actually quite like how LMAX works; it’s very common, most exchanges follow similar sorts of patterns - they’ve driven down to become an event driven system. That sort of system is not normally talked about anymore, although the design patterns go right back to the 1960s. Some of those patterns were similar in IMS, which was developed in 1962 for the Apollo Space Program, right back when IBM were trying to process those events, so a lots of it's not new. What we seem to be very good at in our industry is forgetting the past. So a lot of what we tried to do with talking about that publicly with LMAX was to show some of the cool things that we’re doing and that helped us recruit.

What it actually meant that we were doing is, we weren’t following the normal sort of J2EE, or Grails or Rails type development that a lot of people are doing today; those are just a sample of how people are developing, people are developing in many different ways, but they tend not to think about developing in-memory applications. They tend to be very much database backed and having layers of frameworks and layers of caches, be that Memcached, looking at Hibernate, different things like that. What we wanted to do was have a pure in-memory system that could run at in-memory speeds, but then develop it in a way so that it was highly available, resilient, and sort of could be clustered and run in a certain scenario that would give us all the in-memory performance but all the resilience and high availability you normally have with a database; and that drove a lot of our design.

The Disruptor came out of that because we started looking at, "Well how do we multiple things in parallel?" For example within a design like that you need to replicate to other nodes, you need to replicate to other datacenters, but you also need to write down to disk. If you are having to wait for each of those options to happen one after another, you end up with a very long Latency to do things and you get spikes in the Latency, because for example, if the disk is slowing down you end up holding up everything else behind it, when you could be replicating elsewhere. So the first major thing that drove the design of the Disruptor was, “How do I do all of that in parallel?”, "How do I process the business logic, replicate, write to disk, un-marshal data and have that so it all happened independently?" and that was one of the things we achieve by having event processes running that way.

We also didn’t want the context switches into the kernel, and at the high end of performance people tend to write what is known as lock free code and that is code that doesn’t use locks, be they mutexes in the sort of C world, or your sort of standard locks in Java would be the synchronize blocks, because whenever you get contention on those locks you have to get an arbitrator involved, and that arbitrator is the OS kernel. When we swap to the OS kernel it’s going to do a lot of work on your behalf: it’s going to run a lot of CPU cycles, it’s going to pollute your cache, and then when it eventually schedules your process to run again, you may be running on another core with a cold cache, and that can slow things down. I like to use the analogy of, it’s like getting lawyers or the government involved when you're trying to coordinate a simple activity: If you and I want to just agree something, we agree a protocol by which we can agree that coordination. It’s so much more efficient than say: “OK let’s get the lawyers in and then we'll work out how to do a simple thing like how do we both get out the door together”.

3. So you need to have one core writing to one memory location if I’ve understood you correctly. Is that right?

Ideally the idea is to follow what I call “The single writer principle.” At any given point in a design, if you have multiple writers to the same location you have contention, and that contention needs to be managed, and that slows down your application, because you get queues form behind that contention point. Because you can only ever have one thing can access anything at one point in time. If two threads can access the same piece of memory, it will end up corrupt, so you need to schedule it in such a way that you queue those up. By scheduling not to queue them up you get a queue formed and Little’s Law comes into effect, and that introduces Latency and slows things down. If you change the design so that any given resources is only written to by one thread of execution, be that a process, a thread, a service, whatever, you then don’t have that queueing effect because whatever is doing the update can run at full speed. Many other things can read that, that is absolutely fine; so when you are talking about cores writing to a piece of memory, the cache coherency system can have all of those changes made available on many other cores and many other caches, that's absolutely fine and scales very, very well as long as only one thread is ever writing to that piece of memory.

4. You mention the fact that essentially you have everything running in memory. It’s kind of an obvious question but what do you do if everything crashes?

That is why you want to have it replicated and saved to disk, which is part of why the Disruptor grew the way it grew. So we follow an event source pattern; this is the type of system that has been around for a long time; so the likes of Tandem computers in the 1960s and 1970s, ....or, sorry, the 70s and 80s onwards.... were following those principles, IMS followed those principles from the start. If you journal all of your inputs into a system, you can recreate system state at any given point in time and re-play it. Now that would be a slow way to restart a system but it’s an ultimate backup. If you take the same stream of events and you send them to multiple servers, you now have multiple copies of it running in memory; if one nodes dies another one can take over straight away, and it's already got everything hot in memory. One of those streams may even be to a remote datacenter so another datacenter can take over in event of complete datacenter failure.

5. But I guess the core of that then is that the current state of the business logic processing can be entirely derived by reprocessing the input?

Yes, that is key. You must have a completely deterministic re-play of events.

6. I’m interested in how you got to that design. I’m presuming the approach involved quite a lot of experimenting and running different, you know, trying different approaches?

A lot of those designs, I was familiar with them for a long time. Back in the 1990s I had the pleasure of working alongside one of the chief architects of Tandem, so I was exposed to these type of designs early on my career and I’ve built many systems since. As we came through, what I’ve seen at Betfair, and what I’ve seen with other exchanges was happening; this sort of thinking is becoming quite common. When you have got incredibly high write rates or transactions that are contending on the same data, this is common within financial exchanges, or sporting betting exchanges, all of that activity tends to be on one particularly event or one particularly instrument. Like if you take sports betting for example, it’s all going to happen on the big game that is happening right now; also games are not scheduled to conflict with each other because of the viewing side, so you can’t cluster that and make it go wide because all of your data is contending on the same thing. Same thing happens in finance, you’ll find that particular financial instruments or companies are traded much more than others, so you have to work out, "How do I solve this problem in a serialized fashion? I just can’t shard it and make it go wide"; and those types of designs have been around for a while. I’ve seen many of the exchanges now globally since LMAX, and even during my time at LMAX, and all of the designs end up being incredibly similar. A lot of it’s the devil in the detail and how well people execute on these designs for ultimately how well they perform. You can take exactly the same two designs, but one is a bit more sympathetic to memory layout or how many branches there are in the code, and as a result gets significantly different throughput and performance characteristics. But it's still the same basic design; they all perform very well, it’s just whether the performance is absolutely stellar.

7. And is that, sort of, understanding the hardware platform that you are running on? That's what you mean when you talk about mechanical sympathy I guess?

Very much so. So the mechanical sympathy term, blatantly stolen from Jackie Stewart the racing driver, who said that to get the best out of any car, you have to have a sympathy for how it actually works and then you can work in harmony with it.

8. Do you think the problem of developers maybe not understanding the hardware that their programs run on, is something that's got worse over time, as we’ve got more abstracted away?

Interesting the choice of “worse”; I think people aren't even aware of it as an issue. I cut my teeth computer programming in the 1980s, back whenever we had ZX80s and 81s, and Spectrums and all of that [My Era as well, yes]. You had to deal with memory directly. I remember programming in assembler and then C came along and I thought, “Oh yes, C: a language in which I can actually write function calls without all of the pain”. But you became intimately aware of memory: how much memory you had, how restricted it was and how it worked, otherwise you couldn’t make these things perform. I think it’s a great sort of learning bed that we seem to have lost. We're not going to have another era of the 1980s again, and most of the developers I know who are very good at this, they were all growing up during that era and they experienced that. Or since then, they have worked for people who were in that era, and worked closely with them, so they’ve kind of learned their craft under those sorts of people.

9. I guess very restricted devices, things like the Raspberry Pi and stuff like that, maybe helps?

Helps to an extent, but still quite a powerful device for what most people are actually doing. It’s interesting the languages that people choose. Java is my predominant language, it's what I use most of the time. I have over 10 years of C and C++, and I’m very comfortable with it, but I like some of the higher level productive languages. But having that history in C and C++ gives me a much better understanding of how to get better performance out of Java. I believe that most people should be polyglot in the languages they program in, and most people should spend some time working in a language that directly manipulates memory, because you just get a much better understanding of it.

10. That’s very interesting. I wanted to talk a bit about Performance Testing because I know that's an area that you are very interested in. So I guess the obvious thing to start with it is, when do you start doing Performance Testing

Well how most people do it is they wait until they go live and then discover that they have a problem. If they are lucky, they might run a performance test before they go live and then go, “Uh-oh; time we actually fixed this". The truth is, it's like any Agile development, it’s much better off if the feedback cycle is shorter. And the sooner you know you’ve got a problem, the sooner you can fix it, and you know exactly a way you can fix it. I’m a great believer in you start with Performance Testing. Now, not every application needs to be performance tested; if you are going to write a script to do a simple data transform and you are going to run it once, you don’t need to have Functional Tests and Performance Tests; they have to be appropriate for what you are working on. But if your system is significant, and it’s critical to a business and its performance makes a difference to a business, you need to have the performance correct. And you can agree what the performance characteristics are with the business upfront; that is not always easy, but you need to work on that, and you get a better understanding of the business. But starting from the beginning, I find, gives you much better results and a much greater understanding. Over the course of projects I actually believe people are going faster in their ability to deliver by having Performance Tests in place, because their understanding is better. It’s a bit like debugging as you go all along; you actually hone your skills at debugging. I find it’s one of the greatly over-looked areas: we aren't taught debugging at University, we aren't taught profiling at University; very few people actually know how to do this stuff well, and one of the best ways is just to do it often. Anything that is hard, do it often; you get better at it, you get faster and you learn a lot from it.

11. Would you include Performing Testing as part of your continuous build cycle?

Yes, very much so. I think that is all just part of the pipeline. You are going to have commit tests, you are going to have acceptance tests, you are going to have performance tests, you want to have soak or endurance Tests. You want many other types of tests along your pipeline of CI because then you find them quick; it’s all about that feedback cycle of knowing as soon you have broken something, you know what is likely to have broke it because you’ve got the change that's just happened recently.

12. You mentioned the idea of writing tests first, and performance tests first, so presumably you are expecting the developers to write the performance tests?

Yes, I think it’s an anti-pattern when you use a performance team. Quite often it’s interesting; it’s a bit like having any sort of team will make people say testing is completely another team’s responsibility. Everyone is responsible for the quality of the code that they produce; you can have specialists that can help you write better high performance team... or high performance code. I like it that the people write the Performance Test themselves, that gives them a deeper insight into how their code works, what it's capable of doing. That then, just that knowledge, helps you develop better code for other things that you're doing and this sort of innate idea of the cost of doing things. If you want to do R&D on it, I think that is an interesting thing, so you may want to try say a new network card, new products, new databases whatever, how are they are going to perform, how are they going to give you the characteristics you need. It's kind of nice to do a little R&D exercise on the side and then take that knowledge back into the team, but I think it's important the team own that. Generally if everybody can change any part of the code base, scheduling is not so much of an issue, all the responsibilities are the same. I find that when people are thinking, “That’s not my responsibility”, they kind of abdicate and they walk away from it, and I think that becomes dangerous.

13. So trying to draw these strands together a bit, we are trying to write code that is sympathetic to the hardware that we are running on, we are interested in getting developers writing Performance Tests, presumably then the hardware that we develop on becomes quite important, i.e. do you want the box that you are developing on to match, or approximate to, the production environment?

Yes. If you look at the history of computing and how people are working, it used to be that you developed on the boxes that you actually ran on - you developed on a mainframe, you ran it on a mainframe; the characteristics were the same. Same with the mini computing age. The micro computing age changed things interestingly, because if you are developing on it, you don’t tend to run on it as well. That was kind of classic and understood, because let's say we were developing client server applications, you typically had a client you were developing on, you had a server. The internet age has changed things in a fascinating way, in that we’ve now got all these heterogeneous clients and we tend to pretty much have Linux servers, we have other types, there are Windows servers but Linux is the predominant server platform. It’s so common, we see people developing on Mac Books; great, good fun, nice hardware to do it on, but it behaves differently from those Linux boxes. Now that's going to behave quite differently from a performance perspective, but also quite differently from a functional perspective. I’ve seen so many people chase bugs on servers where the VM that they... say they are developing in Java on a Mac... behave very differently than the VM did on Linux. So it's not even just a performance issue. Then you've got all of the other non functional issues: you've got the security issues, the performance, the quality of service, there’s lots of different ways they behave differently, and people don’t have a feel for that. Who fixes that problem in production when they’ve got no experience of it? You tend to have an ops team who have some idea about that, but have no idea about the original code. It seems like a real disconnect to me; it doesn’t seem right as a way of working. I’d go one step further than that; most people now are developing on laptops, and laptops are great, I’ve got two myself and I use them a lot when I’m on the road, but when I’m actually doing more sustained development for a large period of time, I like to be working on a desktop pairing station with another person; if I’m not on the road. And that pairing station I like it to be very like live: I like it to be multi socket, I like it to be NUMA - Non Uniform Memory Access; very like a production server. These things are so cheap now. You go back 5-10 years, there was a good reason to have a server hardware platform that was very different from a development hardware platform. But now you can buy these machines for a few thousand dollars, a few thousand pounds; they are not expensive. You can be running on those, you’ve got great other benefits. We start building, we can be running all of our unit tests in parallel on these boxes with lots of cores that behave like a server. We learned so much more about our applications, we know the platform much more intimately, so whenever you are called out to fix that production issue, you are not on an alien OS on an alien piece of hardware. It’s comfortable, it’s the comfortable slippers you use every day. That should be good service to the business, that we are used to those sorts of things. Seems crazy that this shiny machine is so different from what we are actually deployed on.

Charles: I think that’s a really good place to end it actually. Let's close there. Thank you!

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Bio

About the conference

This content is in the QCon Software Development Conference topic

Related Topics:

Sponsored Content

Related Editorial

Related Sponsored Content

Popular across InfoQ