Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations Mechanical Sympathy Panel

Mechanical Sympathy Panel



Howard Chu, Michael Barker and Aaron Bedra discuss the modern hardware, the options that are enabled, skills needed, and what to expect in the future.


Howard Chu is Systems Level Developer & CTO @SymasCorp. Michael Barker is Software Engineer & Independent Consultant at Ephemeris Consulting Ltd. Aaron Bedra is Senior Software Engineer @drwtrading.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.


Montgomery: This track isn't really just about performance, it's not really performance directly. It's working with hardware. To me, that's pretty broad. Deliberately tried to get talks that were a little bit different, such as Aaron's talk on hardware enclaves, and the Netflix workstation talk. We wanted to look at other aspects. I'm fairly transparent. I believe that the idea of mechanical sympathy and working with the hardware goes beyond just performance concerns, goes into quality, it goes into security, and things like that. I just wanted to see if you disagree with me, or if maybe I'm not looking at it quite the same as you, just wanted to get your opinions.

Bedra: I definitely agree. I think that hardware offers a lot of things, and mechanical sympathy or being sympathetic to the architecture and the complete offering of the hardware is important for designing systems. It's not just about efficiency and performance, it's about using the hardware as much as possible to achieve the design goals you have in mind.

Barker: I think, overall, it is largely about one of the things that you mentioned about quality. I think performance is essentially a quality metric. A lot of the problems we see, with a lack of mechanical sympathy do manifest as performance issues. That tends to be where I've focused. I think that's the key point. Certainly, organizations I've worked for in the past, prior to working in the financial industry, have taken performance seriously enough as a quality metric and thought about it in those sorts of terms. Nobody likes slow software. Everybody hates it when their PC is slow. Everybody hates when a page is slow to load. If you're a big project with 100 people or something like that, very few people are actually caring about how it goes. A lot of that's just like a disrespect for the users. It boils into a lot of things that way, I think.

Chu: I have a lot of reactions to this, starting from the top down. We see these as performance problems, but at their heart, it's about efficiency. We want to use the hardware as effectively as possible. I'm relieved, actually, to see that more attention is coming to this nowadays. I just saw the other day announcement about a green computing initiative. People are finally starting to realize, these data centers are costing us a lot in terms of carbon emissions and all this other stuff, and everything we do that's less efficient is exacerbating the problems that we're dealing with. From that perspective, absolutely. We have to understand how the hardware works best, so that we can make the best use of it. Then we get good performance as a result of that, and we get better environmental friendliness and all these other good things. Yes.

Waste and Concerns Associated with Crypto Mining and Blockchain

Montgomery: I can't agree more. In fact, I've talked a lot about efficiency as opposed to performance, because I think efficiency is a much better way to think about it. With that, I want to get your all's thoughts on the concerns around crypto mining and the blockchain, and all the waste associated with that. Is this something that, making more of a mechanical sympathy type of approach, even from the networking side, can have a noticeable impact on that? Legislatively, we could say no, but realistically, we can't. Then, how do you deal with that and be more mindful of usage?

Chu: That's perfect considering that I'm currently working on a cryptocurrency project. In fact, I designed the latest proof-of-work algorithm that we're using, and it's very crucially dependent on mechanical sympathy. What's in the media attention right now, of course, is Bitcoin because it is the first crypto and it's the largest, and its mining resources are the most expensive, and I would say the most expensive as well. The idea behind crypto or blockchains, is you don't want to have any trusted third parties. You want to have a distributed system, where not just operations are distributed, but control is distributed. It's got to be decentralized.

We come up with Nakamoto Consensus, which lets us do mining through proof of work to validate the true state of the system. In that respect, I think it's a necessary evil. We talk about how much power it consumes and all of that, but when you say, here are the conditions you're given: it must be decentralized, it must be trustless. There aren't a lot of other ways to solve this problem besides proof of work. Proof of stake is mentioned, but it doesn't have the same security guarantees. From the very beginning, we have to say, if this is your goal, that you want a decentralized system, then proof of work is just part of the package. You can't get away from that.

As for the overall efficiency of the system, Bitcoin works with dedicated hardware, ASICs for mining. In my opinion, this is very wasteful. The chips themselves can be very efficient. They can generate millions of SHA-256 hashes per second, or per watt, or whatever metric you want to use. Because they're single purpose, to my mind, they're very wasteful. They can only do one thing. When the next generation of faster ASICs comes around, the current generation gets put on the scrap heap. They're useless. In that respect, there's a lot of waste. From the electrical consumption side, I can't consider that waste. That's their job. Their job is to secure the network by burning energy, and the network becomes secure because nobody else can burn more energy than you, to try and subvert it. There's a lot to balance out there. Overall, the fact that proof of work consumes energy, that is its job. That is how it secures the network.

For example, I work on Monero, and we developed this RandomX mining protocol, which is based around general purpose CPUs. To my mind, this is a little bit more environmentally friendly, because the CPUs that you use for mining don't just become obsolete overnight. If you decide, it's no longer cost effective for you to use it for mining, it's still useful as a CPU. You can put it in another PC, and you can still get useful work out of it. In our perspective, that's a better approach.

Barker: I think, taking a completely different tack. I think what's interesting about where you observe the places where people apply mechanical sympathy, it tends to follow the economics. You can look at Bitcoin, the faster you can churn through it, the amount of cycles you can go through, so they've got all the way to ASICs. In terms of what Howard says about waste of materials is true, but in terms of efficiency and processing, it's about as fast as you can get. You see it in exchanges, because having a fast exchange is always beneficial. Trading platforms, games, the better they use the hardware, the better the economics. You sell more games. You do better trades. You get more customers, all that sort of thing.

If the economics is structured right, so for example, if there was a cost to burning too much energy, if there was a cost to too much wasted material, however the economics is structured, I think the software and the systems and the mechanical sympathy will follow. That's the tricky thing. It's difficult to do on a micro scale. If at a macro scale, economics was structured such that efficiency was the number one thing you're chasing regardless of the software you're building, you'd see it happen a lot more. If I'm building a website, and the best way to monetize that meant that it had to be efficient. I don't know quite exactly how that works. For example, I have to pay a tax on every user, per amount of watt used or something like that, you would see mechanical sympathy being driven through a lot more. I think it's very much an economic thing.

Bedra: I spend my entire day on a cryptocurrency trading desk. It's definitely first class for me. There's a few things to consider here. Any emerging market, emerging technology, emerging idea is going to go through a conceptual phase, an adoption phase. Eventually, once it gets big enough, a phase where it revisits everything and starts to tune and become better, become more efficient. I think we're starting to enter that phase. That could change the market structure. It could change things. I really don't want to comment on that right now. What I think is more important is that I think being a good participant in the cryptocurrency space also means being conscientious about environmental concerns. Part of this is just, it should go hand in hand with being a good participant.

The DRW have recently announced that we're purchasing carbon offset credits, different things to try to help participate at a global scale, a more interesting scale to make sure that we're keeping aware of the environmental cost of what we're doing. I think it is part of being a good citizen these days is just being aware of that cost. I expect to see a lot more interesting things develop, renewable resources, renewable energy sources. I think crypto is also driving some interesting research around how do we make these renewable resources a fundamental part of cryptocurrency in general. I think there's great aspects coming out of that, that I hope to see come to fruition down the road.

Mechanical Sympathy and Developer Productivity

Montgomery: Shifting a little bit away from the crypto angle. I'm wondering, have any of you ever seen any mechanical sympathy concerns or performance concerns. It doesn't have to be specifically to performance. Have you seen it impact developer productivity, negatively or positively?

Barker: There's a couple of ways that I've seen performance concerns and efficiency affect developers. There's about four or five completely different ones that I wanted to mention. The clearest and most obvious one, certainly it's come around in the last 10 years or so, is the performance of build environments. Continuous delivery, the idea that you're just constantly trying to maintain a continuously green build, every change is being built, and you're moving through that very quickly, has meant that you need to have a very efficient system for essentially getting through your build and testing. I remember the system we built at one of our previous jobs, it started out as essentially one build on one machine running all of the tests and all the acceptance tests. Eventually, the time taken to get from start of build to end of acceptance test, and this had a system with a huge number of full-on end-to-end tests with Selenium and all those sorts of things. At one point, it was about three or four hours. Even that amount, previous places, it would take two weeks to get through your testing phase, but even three or four hours, the ability for the developer team to move forward quickly and safely as we chunk that down. Certainly, when I left, it was probably much closer to about 30 minutes. Just that difference in scale just means you can push so much more changes in and be really confident about those changes going through.

Probably that's one of the ones I've seen that's the most important. That's not just about efficiency or mechanical sympathy, that plays into it, but that's also about architecture and scaling as well. Being able to scale out that system to move yourself forward quickly. Not enough attention is paid to that. I don't know how many people sit there and try and write tests that are fast. Quite often people just go, I'll just whack this test code in and just let it run. Java is not a great one. Java is great when you run the same thing over again. How many times do you run the same unit test in a test run, like once? Just significantly you have a chance to do anything with it. The same with build systems, and all that sort of thing. That's certainly one of the big ones that's cropped up for me a lot.

Bedra: I echo everything that was just said. I think the development environment, think about how often you are waiting potentially for your computer to catch up to what you're doing. Even if you're just [inaudible 00:15:53] conscious, you have an idea. You're expressing it. You're waiting for your IDE to be as fast as your mind can push things out. Even that little bit of wiggle that little bit of jitter, it's a cognitive disruption. It's stopping you from doing what you're doing. Even little things like that add up over time. I think there are ways we can deal with that. I know in the Java space, we make a very hard attempt to go towards effectively removing effects. Parametric polymorphism, at the foundation of everything we do, such that no one class demands a particular effect. You can say, I'm not going to do any intentional IO here. I'm going to put it into a little structure, and I'll replay that afterwards in a test. Just remove all the external dependencies, making sure your test suite runs as quickly as possible. Also, it can be tested at the various levels. For us, a fast test suite, absolutely critical to do what we're doing. It's a key player.

I find that once you start to tease apart those types of interactions, removing the effects, removing the things that are potentially slow, it also gives you really nice sandboxes for when you need to go fast. Everything can be constructivist incorrect around the shell, and those little tiny pieces that really need to have extra bits of maybe mutability, or performance eked out, they can be cordoned off and isolated. They're very separate from the rest of the correctness and quality parts of your system.

Chu: Everything you said about the test suite is really hitting home for me right now, because we've been dealing with the OpenLDAP test suite. It's grown to about 85 shell scripts that all run serially, and it's starting to take a long time. The other thing, too, is we've been working on scalability and efficiency for literally decades now. The interesting thing now is it takes a lot of work for us to actually test this at its limits. Now we need a lot more CPU cores and whatever, to actually find out where the sharp edges are. That's new. Back 15 years ago, a dual core machine was enough. We could bring a machine to its knees and say, we have a problem here. Nowadays, it takes a lot more effort to find those problems. The other thing, I don't know if the tools for analyzing concurrency have really improved all that much. Maybe you guys have seen better ones, but it feels like it hasn't.

Montgomery: No.

Bedra: It remains darker, unfortunately.

Barker: I think certainly Todd, and some other people will work out this, but we just take the simplest approach to concurrency we possibly can, we just parse a message. Don't share anything. This is really fast. We don't have to think about it too hard.

Mechanical Sympathy's Relevance to All Programming Languages

Montgomery: When you're working closer to the hardware or you're working closer with some considerations of what the hardware is doing, there's always language questions that come up. The first part of this would be, is mechanical sympathy relevant to all languages? I think we would all agree it probably is. I don't think that's too much of a stretch. Whether you're doing it in Python, or you're doing it in Java, or any other type of language, you still can and should, if it's right, be thinking of the mechanical sympathy aspects that you're looking at. There's something else here too that comes back to the comment that Mike made about the resurgence of things like C, that we're starting to see now that C is a language on the move. It's progressing faster than C++ is progressing. I just want to get your thoughts on, is it harmful for languages like Java and others? It's really work to have some of these aspects. You're forbidden from actually using some of these aspects, like some of the network APIs that Mike brought up. What do you think? You were talking about the resurgence of C, do you think that could hurt other languages? Maybe it may not hurt them, but do you think that we're going to see a shift, or could potentially see a shift for those who care?

Barker: It's interesting. I'm a latecomer to C, which is odd. That's what I've spent the most time learning, certainly over the last five years is programming in C. The reason I see it coming about is a lot of this new hardware coming through, and a lot of this need to gain efficiency. Linux tends to be the first place that most of these things happen, because it's probably the most hackable system that you've got. It's written in C. It's the easiest way to interact. I need to create a device driver that's going to be able to allow me to share some addresses into user space so I can map some memory, and then access this bit of hardware really efficiently. Which is pretty much what DPDK does and a bunch of other things do as well. It's such a lingua franca. It's common across all things, for all it's worth, and things that aren't so great in it, it's really straightforward.

One of the things I do find interesting in some of the newer languages is actually using some of those concepts when I write C, because you can write C in any way you like. I was recently dealing with ownership and memory allocation type stuff, and I got it wrong, and there was a double freeing and all that sort of thing. I actually took a step back and went, how would this work in Rust? How would the borrow checker manage ownership here? I actually wrote it in a manner that modeled what the borrow checker in Rust would have done with the ownership of this particular object, and it folded out quite nice. I think there's interesting concepts in some of these high level languages about how they manage these resources in a safe way. Everything about C is about technique, it's about how you do it to make it work well. You start applying some of those things as well. That's certainly what I've experienced in the last few years as well. An intermediate C level developer is probably where I'd sit myself, under the others.

Chu: I've been using C as my primary language for something like 25 or 30 years. When I'm forced to use something like Java or Python, I always find myself limited because they do their best to hide the hardware from you. The growing focus on efficiency means you can't afford to have these things hidden from you anymore. Modern architectures don't even work the way these language VMs would have you believe. You're not even talking about a single monolithic CPU anymore. There's all these different subsystems that are involved in the work that you want to compute. If you're going to make good use of them, you need low level access and low level visibility into how the system works. C lets you get that, and other languages deliberately try and hide that from you.

Bedra: I tend to spend a lot of time thinking about determinism as a first-class function of designing software. A couple things fall under that. One is lifecycle, which we've talked about a couple times, lifecycle management. Whether that memory model is written and encoded into your program in the case of C and C++, or if it is part of a virtual machine, or it's something else. Being aware of that lifecycle and being able to control it, be deterministic about it, I think goes a long way as well. If you're in a virtual machine environment like Java offers, you want to know, are you creating garbage? Why are you creating garbage? Is it going to be promoted into a generation that's going to cause pauses? How is that garbage collection handled? Every time you're not thinking about that, you're introducing inefficiencies that in micro are trivial, but when you scale a system are catastrophic. I think determinism and essential algebra is a really important part of this.

Oftentimes, we do way too much algorithmic complexity, computational complexity. We're doing horribly inefficient algorithms, data structures. I think getting better at dealing with that, making a data structure that's efficient for what you're really trying to solve, and not saying, everything's a HashMap, or everything's an array. Getting that stuff right, and nailing that stuff, and then using that to build an algebra of how these things compose, and how they come together, and how they're deterministic. It doesn't matter what language you're in at that point, what matters is you have a concrete expression of your problem. I think that really lends itself well, no matter what language you're using, to some mechanical sympathy, or at least some efficiency. I've seen lots of things abused in the name of, "I only have this tool so I don't want to bend this far." I think a lot of times it ends up biting you.

Barker: There's an interesting follow-on point to that, because there's always that tradeoff between how quickly can I get to a working solution, versus how much cost has gone into dealing with the language. There's a lot of problems for which the actual domain is really quite complicated. I've worked on a broker before. Broker is quite a complicated system, you've got lots of relationships between lots of different things for risk management, all that sort of thing. This was built with Java. I would be very nervous trying to write in C, because it requires collections of different structures and things like that, and certainly would take me a really long time to write one that worked safely. Yet, I also worked on an exchange, and actually, I'd be pretty ok with writing an exchange in C, because they're a lot simpler. The data structures are reasonably straightforward. They're a system that have been hammered out, a number of people have written order books and matching engines, they're a very well understood problem. Same with Aeron. Aeron under the covers, the data structures are straightforward, building the messaging side, pretty good. Aeron cluster, on the other hand, maybe we want to keep it in Java for a while until we really fully understand exactly how that should work. There are those tradeoffs there as well.

Tuning For Low Latency

Montgomery: A question that came up, has actually to do with NUMA. Although throughput has increased over the past decade, what are your thoughts on the increased difficulty in tuning for low latency specifically? I mention this because modern CPUs are moving from predictable monolithic architectures to a modular NCM approach. On the high core AMD Zen architecture, for example, communication between cores on separate CCX CCD incur a rather large latency penalty. Intel's newer CPUs experience something similar doing cross communication with cores that are further away, do we run the risk of optimizing too much? This is a very interesting take on the NUMA architecture. I'm just curious what your opinions are.

Bedra: Yes, you can optimize too much. Hundred percent, yes. I see it all the time. First, make your software correct. Make it deterministic. Make it tested. Oftentimes, in the true expression of testing, you'll find a lot of very interesting things that you do incorrectly, if you're testing right. It's easy to find concurrency bugs in testing if you do it right. Performance certainly, there's some science, there's also a little bit of art as well. Trying to nail that right without a foundation, you're going to end up incorrect. Measurement always. That has to come first. That has to be the thing you do before anything else. Good tests, good measurement, good foundation, and then make it fast. Then worry about eking out the extra bits of performance you need, whether that's a few milliseconds or a few nanoseconds depends on what you're doing, what problem you're solving.

Know what you're shooting for. Know what your targets are. Know, not just where your floor is, but what your tolerance is. What are those tail latency requirements you can tolerate? Can you be ok with a little bit of jitter at the expense of maybe a lower floor? Is a very deterministic output or throughput really important to you? There's a lot of different things to consider there. I think we get this wrong a lot. We as the industry, we get this wrong, because we try too quickly to eke out a bit of performance improvement without really, truly understanding what we're doing along the way.

Chu: That sounds like the whole argument against premature optimization, which is certainly a valid one. I was wondering if this particular question is after you have a correct system, and after you know that it needs to be tuned, then you're no longer in the premature phase.

Montgomery: There is that premature thing of jumping in and pinning threads and taking a look at the system without looking at the whole system, just doing it because you know you're going to need to do it. That I think can be dangerous, because that leads you down certain paths. The way to really break from that is to look at things such as with measurement, looking at what you know the data flows are going to look like, things like that.

Barker: You mentioned pinning, that's a common one, certainly with some of the consulting. You go in there, and they're going, "Here's our system, we would like some advice on how to pin the threads to various cores to make the system faster." You just go, ok, that's the last thing you do. Let's just run a profiler first and remove the logging, make the logging async. That's going to give you your biggest performance win, for example.

To go into low latency specifically. I've not worked a huge amount directly on the trading side, but a lot on the exchange side. Certainly over the 10 years I was working full time on an exchange, there was an interesting transition, it was originally just under a millisecond, and when I left, sub-100 microseconds was the standard. Once things got down to about that 100 microseconds range or below that, it stopped being about that race to the bottom, because a 50% gain used to be 500 microseconds, now it's only 50 microseconds. That's not that much faster, it certainly became apparent that what customers wanted once you get to that level was actually predictability. I think there's still people in the trading environment that want the absolute thing you can get, and they're probably building ASICS. Certainly, where I see latency going, it's about predictability. This is much more around software than hardware. The hardware is generally very predictable. If you do the same thing, you're going to get roughly the same result.

That's why when I was talking about DPDK, and why it was so important that the long tail with larger messages, it was flatter. That's a big win. That's why some of these, feeling out ways that you can get to the hardware more efficiently means you're paying less software overhead. It's the software overhead, especially when you put it under load, and especially at the long tail, is when it just goes weird, just strange things happen. You hit the edge cases with your data structures. You kick over queue buffer limits. You go over some threshold and things start queuing and backing up and drops and blocks start occurring. That for me is on the low latency side. People should be focusing most of their time on is, how do you make it predictable? Where are the boundaries? What is the throughput limits of various things? Then, how do you deal with those? How do you apply backpressure? How do you shed load to keep the system responsive and effective?

How Junior Engineers Can Avoid Drowning on Teams Focused on Performance

Montgomery: High performance always seemed like a field that requires deep expertise and vast experience in many domains, storage networks, CPU, OS, programming language. Is it even possible to hire junior engineers to a team that focuses on performance? If so, any advice on helping them avoid drowning?

Chu: From what I can see, it does require fairly broad experience. You need to understand all of the components of your system. All the peripherals you're going to interact with, all of the remote networks that you need to interact with. It does require a fairly broad scope. Not just broad scope at a surface level, you have to understand each of those components at a fairly deep level. As a junior coming in, your only approach would be to bite off one piece at a time.

Barker: I think a lot of it comes down to, if you want to build a great performance team, probably one of the biggest personality traits you need from somebody who's interested in this is that curiosity aspect. The willingness to go, "Actually, I'm really interested in how this works. I'll get the job done, yes, but I actually want to peel the layers back and see." It's this curiosity aspect. As to bringing juniors on, I think there are some principles you can follow. What I would first do is, make sure they know how to write functional tests, because you can't optimize without knowing the system still works. Then writing some form of benchmark or load generation, and then profiling. I think, if you get those three things, and they've got that curiosity idea, then you point them in the right direction and say, have at it for a bit.

It becomes then domain specific. Are you dealing with storage IO, networking IO, in-memory type of things? Then there's some principles that crop up there as well. If you're dealing with IO, it's about cost amortization. How efficiently does the system work? What's its units of usage, block sizes, or frame sizes? How to tune and tweak those sorts of things. If it's largely CPU bound, you're probably talking about memory, so you can talk about concepts like spatial locality, and temporal locality, and common access patterns.

I think there are some general principles that crop up that you can start with to get better performance. Then you need the person who wants to figure out why those things work well. Then they can peel the layers away for themselves and start learning all of those deeper level details. I think there's some general principles that you can start with. From a more organizational perspective, pair programming is probably one of the best ways I've seen to bring people on board really quickly. Get them next to an expert, tackle a problem hands-on. You can do that remotely, so many great tools, and Zoom, and various tools built into IDEs now. No excuse not to do that. Take that approach.

Bedra: I would say advice in both directions. If you're the one doing the hiring, be very clear that it's on you to not let them drown. Recognize the investment you're making. It's great to bring juniors on, recognize the investment. It's not the same as bringing somebody who already knows this stuff on. You need to be very conscious and very specific that we're going to spend the time to make sure they learn. If you're the junior looking to get in a team like this, make sure that you're going into an environment that's going to support you. I hesitate to say it sometimes, but I think it's very important to set the expectations upfront that there's a cost to this. This is a deep field. There's a lot to think about. There's a lot to consider. It requires a lot of subtlety, and a lot of nuance. You want to make sure that you got to be supported doing it.

That being said, pairing, I absolutely agree with. It's a great way to do this. Also, the fundamental understanding of, how do computers work? How do networks work? Building those really common foundations so that you can start to reason about why something might be going on, and then narrowing it in. I spent a lot of my career in security as well. Security is a giant abstract thing. I'm bad at a lot of security. I'm good at the things I focused on over time. There's very specific things I spent time on, mainly around software security, but more importantly it's, pick where you want to focus. Even if you change directions, that's fine, but find one place and really dive there first. Don't try to boil the ocean. That is the good way to drown.


See more presentations with transcripts


Recorded at:

Apr 23, 2022