Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations A New Way to Profile Node.js

A New Way to Profile Node.js



Matteo Collina presents a new and straightforward way to identify bottlenecks in Node.js and beyond.


Matteo Collina is a Principal Architect at nearForm. He spends most of his days programming in Node.js, but in the past he worked with Ruby, Java and Objective-C. He is also the author of the Node.js MQTT Broker, Mosca, the fast logger Pino and of the Fastify web framework. He is a member of the Node.js Technical Steering Committee.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.


Collina: I'm Matteo Collina, @matteocollina on Twitter. You will see my handles everywhere. I work for this company called NearForm, and we are a professional services company based in Ireland. If you need any help with Node.js, we are in business, just so that you know.

Once upon a time, we just launched a new back-end API, and everything went fine. It was for a very famous newspaper here in London. At that point in time, they called me at 7 a.m. on Sunday morning, because the full thing went down. I was level-3 support, so I was never expected to, at any point in time, be called. I was so confident of what my team has built. When that happened our ELB autoscaling cluster was full. The traffic was going down to crawl, nothing was working. In the end, the manager was really, really upset about this. Angry people were really angry on Twitter. I don't know how many of you went into that part.

Why Is It Slow?

After all of this was solved, it was relaxed. It was very easy to fix, actually, at that point in time. Because, you know what? I'm killing all the servers and starting all of them. That worked. There was a very bad bug on our code. We got back to that bug later on, but that's important. Anyway, after that, Monday morning I am to ask my question, "Why?" The question that you need to ask when those type of incidents happen, wind-up applications start slowing down, you have a performance issue, you have a problem in production, is you need to ask, "Why?" Do a full post-mortem debugging session, post-mortem on the incident, so you know what happened and what has caused the problems that led to be called on a Sunday morning. Maybe question a little bit your life till then.

The answer is always because there's a bottleneck, right? There's one bottleneck on your system that is being hit, and that's why your things get slow, or essentially fail badly at some point. It's all because there's a bottleneck, and your system can not grow, and then all that bigger flow of data - that is expected. This is probably the most important part of the talk, so because bottleneck. Then we get into some more technical details.

There is two big questions on if you're working on Node.js, on why a process can be slow, and bottlenecks. There's basically two kinds of bottlenecks on a Node.js application. The first one is the bottleneck is internal. You have an internal bottleneck, and I'd say that the Node.js process is on fire. Typically, it ends up with the CPU 200%. Wait, it's a single-threaded system with the CPU at 200%? Why this can happen? We talked about it. I have a 50-minute slot, so I need to build up a little bit of tension.

Then, the second part is the bottleneck is external, and something else is on fire. Node.js is a runtime to build I/O-heavy, to some extent, applications. You need to talk to a database, another API, collate the response, whatever, send it back. Maybe do a bit of server React, server-side rendering, and that's really slow.

Finding Bottlenecks

Anyway, the key part there is you can have a bottleneck on one side or on the other. Now, considering that we know that there is a problem, we had a problem in production. If we had a problem, we know that there is a bottleneck. Where is the bottleneck? You have seen a very nice keynote today, talking on how to know, more or less, in which API, with tracing data and so on, where to find your bottleneck. This is the path that you need to follow to find bottlenecks on live production system.

You start with a nice performance issue, and there's a turtle there for a reason. Then, on your production system, you try to reach out and you get some diagnostics data. You can use tracing, you can use an APM, you can use whatever product you want. I don't want to say one product versus another, just use one. There's a lot of companies that provide these services, a lot of open source projects that provide these services, so I don't want to call any favorites. I do have, but I do not want to, during the talk. The hardest part in all of this, and why you need to have those type of system in place, is because you need come to a point to be able to reproduce your problem.

If you can't reproduce your problem, you are never going to be able to fix it. That's the first part. There is this sad face. Because that's the hardest part, more or less, because you need to know exactly what your product is doing and investigate why things are happening, to some extent. Have an understanding, at least, at 10,000 feet of, "I can reproduce this now. I can make my server crash. I'm happy now." How do you do that? How do you reproduce a performance issue?

Typically, what you need to do, you need to hit your servers, and in terms of web application comes about in simulating load. Now, I am a Node.js developer, so I ended up using a lot of AB, WRK, JMeter. I ended up writing my own thing in Node. It generates the same amount of load with AB or WRK, but you can script it with Node.js. It's decently fast for a single machine system. You can use Bees with Machine Guns, if you want to have some more cloud fun. You definitely can test the CDM with this.

We end up going back to our diagram. You are now able to reproduce the system, to reproduce the problem. Then, this is a very nice moment, because I can debug this now. You can start debugging this. You can still use your tracing system, your APMs, your stuff, to gather some more insights, but the most important bit is this, "Step 5: Reproduce on the Developer Machine," not the live environment. I don't like coding on servers, that we have deploying code inside functions, inside containers, inside Kubernetes clusters, inside VMs, five steps down. Once upon a time, you were able to SSH on a server, open a file with VI, and patch it in place. Now, that's not really a good way to do things. If you want to do some testing and development, you need to reproduce the problem on a workstation or something.

After that, especially on Node applications, there is a big question, "It's kind of a big problem. What do you do? How do you fix it?" Because that's the question. You know how to reproduce the problem. You don't know exactly what the problem is. You know that on certain conditions, your server goes down because of that problem. What you can do, you can tap into some more diagnostics, which now is fine-grained. The tracing system provides you a 10,000-feet view of what is happening on your distributed system. However, can I know what Node.js does, how my Node.js system is running my code? Yes, you can. You can tap into some more fine-grain diagnostics data. Once upon a time, this diagnostics data was more or less like a wall of text. It was kind of the metrics, and you can decode it. After some time, you're seeing a lot of problems and patterns. It was not really super straightforward to read and interpret what was happening. In the end, in order to fix this problem, my perfect solution was staring at the screen for a very long time.

Later, I got a dog. That was a definite upgrade of my workflow, because I started walking the dog. Probably has the same effect of me looking at the screen, a lot of time to think. After all of this, you got a performant application out of it. That was the best way to prove to the bug, to solve a performance issue with Node.js. That's probably the last resort approach - walk the dog. That's my last resort approach.

As I said, NearForm is a professional services company, so we have been helping companies putting Node.js in production for some time. I've been working there five years, so a long time. Basically, I have been traveling all over the world, more or less. One of the best gigs I did, I was flown to Chicago, two hours, went home. That was it. I already knew what the problem was.


That is the thing, but can we do better? Who knows? Well, maybe yes. We have started building this tool at NearForm based on our experience solving these type of performance problems. This tool is open source, so you can check it out, it's full open source. Look at the code, send pull requests, it's for people to use. You can install it with NPM. It basically streamlined what is our process to solve performance issues for Node.js applications. That's what I'm talking about, a little bit.

Clinic, in reality, is a bit common, but it has three main components in it, Doctor, Bubbleprof, and Flame. Let's go through them one at a time.

Clinic Doctor

The first one is Clinic Doctor, and it collects some metrics on your Node application by injecting some probes in it. It monitors the Garbage Collector, it monitors the event loop, the health of the event loop, this magical thing. We dig deep into the event loop later. It monitors the CPU usage, and it monitors the active handles. What is a handle? A handle comes from the internals of Node. It comes from libuv, it's basically a socket, or a file, or some other native resource that is actually mapped by, known by, Node.js. They are very important, because those can leak and have other problems, so that is important stuff. In fact, that was the actual problem that we had when I was working at the newspaper.

This is Doctor. It does some important stuff, it plots some diagrams. You can run your code with Doctor on your local machine. The type of data that we extract here is very fine-grained, it generates a lot of data. You can run this in production, to some extent, but not full time, because you'd literally get a lot of information out. It does very important stuff here. It is able to automatically detect that we had some of the most known problems in Node.js application. How do we do this?

Well, you've seen at this conference there's a lot of talk about machine learning. We are using TensorFlow.js internally to detect, to make those type of recommendations. Check out our code, especially this one with detect potential I/O issues, powered by TensorFlow. Some others are powered by Statistics. We tried the Statistics approach, it didn't really work out on all cases, so we are moving them to the more machine learning based approach. It's a nice application on TensorFlow. We monitor all those things, and we create recommendations. That's what this tool does. It tells you what we guess, what is the problem in your Node application. The way we build this, I sat down with a statistician, a data scientist for some time. Basically, he said, "If I see this curve, that's the problem. If I see this other curve, that's the problem." I said, "We can build software to detect those." That's how this tool was born.

Clinic Flame

The next one is Clinic Flame. It's basically a flame graph tool with a nice UI built for Node.js. It does CPU sampling, as all things it's based on, the V8 profiler, and stuff like that. It tracks top-of-stack frequency. Basically, you have your code stack, samples it, take it into. After the running is finished, we take all the samples and it does some averages and collects the metrics, so it creates a flame graph. You've probably seen this flame graph in the past. We now have some little bits and bobs, so we are able to detect where the bottleneck is. In this case, we have a nice thing that is inside Node, module to Node. The nice part is there that we have a bar here, which tells you which thing you need to focus your attention on. We are able to end components of certain data from V8 itself. For example, we can detect inlining, we can detect other optimizations that are happening, it can be quite powerful. Those are under-options. We differentiate between your code and the dependencies.

Clinic Bubbleprof

Let's go back to Bubbleprof, which is the new thing, to some extent. The question is, "If you have an I/O problem, and you have an I/O problem in your Node application, those can be extremely complex to debug. Because most of the time, they do not come down to just being, "There is a query. My data rate is slow." They came down to some very complex interactions between database queries, promises, nextTicks, setImmediates, timers, and whatever can get complicated. It can get very complicated, if you want to. It creates a bubble graph, which is this type of diagram, which is a visualization of asynchronous activity in your Node.js system. We'll go through this later with a lot of examples. Just so that you know.

Live Demo

The question is, going back to our original starting point, "Where is the bottleneck?" We can ask Doctor, "Where is the bottleneck?" You know, it will give you two answers. We'll choose two recommendation. In one case, if we say that the problem is in Node, we tell you, "Use Clinic Flame." If it's an I/O problem, it's slowing down because of I/O issues, then we tell, "Use Bubbleprof." Now, it's the moment of a live demo, so the demigods may be with me, hopefully. Let's see if my Docker container is up - it is up.

First thing, we have a Node server. What we can do with my Node server, we can send an HTTP query to it. It's perceivably slow. What does this thing do? Let's look it up. It's basically an HTTP endpoint that does a couple of queries, and computer magics, and returns something. Let's do a benchmark. I'm using my autocannon library. I'm doing this two connection per second, because I think this is pretty slow, for five seconds, and then I'm doing some little magical tricks because this is a Node.js application. Basically, I can run my test straight again, my Node process. We're starting up the process, detecting which port is listening to and automatically run my load test. It's on Node, so we can do these type of things. You can see here that latency's horrible. It's 1.3 seconds at the 50%. If you go to 99%, it didn't go so bad. Also, sample the number of requests per second and those numbers are not really encouraging.

Let's see, if I run this up to maybe 100 connections. This will be a very nice number, very round. Know that the minimum goes so high because for some very bad approximation bug that I have in autocannon. How do you debug this? You use Node Clinic. You "Clinic Doctor", which is our tool, and then we can still use autocannon inside Clinic. We can do with two connections, five seconds on slash, and then we do "node 1.js". We just keep hitting our tool with this. It spins up our process, inject the probes that we need, and it actually runs. Then it analyzes the data, and it's running TensorFlow now. It's also rendering an HTML page. You see, it does a bunch of stuff. It also uses Browserify, if you are into this. I'm actually zooming a little bit in.

Basically, it says it detects a potential I/O issue. We have a bunch of things here, a bunch of recommendations. There is, "Diagnose: Use Clinic Bubbleprof," but also there is a "Read more" section with a bunch of links to look into, if you want to investigate more of this topic. It does ask to use Bubbleprof, so in order to use that we just hit this, "clinic bubble". Hit with some load. Note that it had an impact in our latency. It went up, because Bubbleprof generates a lot of data. If you are running it in a normal production system, you are definitely going to generate up to maybe half a GB of data for a minute run. It generates a lot of information, and let me explain why. Because, for every asynchronous activity in your system, it collects the full stack trace for that asynchronous activity. Then you can enable this doing certain things. Let me zoom in again.

For example, you can see it on this right that this is the block of asynchronous operations. You can see that it does some asynchronous activity here. It's color-coded, so there is networks, data, crypto, scheduling. What is scheduling? Scheduling is, hover it, promises, and nextTicks, and setImmediates, and timers. Then other is because we are using a library, mongojs, that has been instrumented with Async Hooks, which is the underlying technologies that provide this in Node. That's what it is. If you have some issues with seeing this, we can switch to another color code. I don't know if you can see this better or not, probably not. We have two important parts here. One is you can see this is forking. Let's start from the top, this is the root node. You see that this little thing here, lingering. We'll dig this a little bit later in the talk.

We go down here, and this is when we receive a TCP call from the network. It says that we have eight frames from previous async operation createServerHandle. This is from Node core. These are some empty frames, these are all the things that happen natively. Then you have more things happening in Node core internal. This is Node core doing its stuff. You can see that we are handling our ConnectionListener, so we are probably setting up the HTTP parser here. Then, in terms of the code of my application, and it goes into this mongojs library. You can see that this starts from line 11 of my application, and it goes into mongojs and creates a cursor. We are reading data from Mongo, so it creates a cursor - makes sense.

Note that if we overlay on this thing, you can see that a nice pattern emerges. You can see that there is blocks of stuff happening here. This is because we're only doing two connections. We are issuing two connections at some point in time, together. Then those two finish, and then those two move to the second block, then it replies. You can see even this type of patterns happening in the system. If you look at this one, it says that the problem is in line 13. We have two problems with my system on line 11 and line 13. Here you go, line 11, line 13. There's two queries in Mongo.

What I do is I open my Mongo console, I go to my library. This thing is just querying some NPM data. What we do is we have the query here that we are doing, and we just ask Mongo to explain it. We can see that mongojs is actually doing a full sorting of that collection, so probably is missing an index. If we look at it more closely, we can see that there is another collection called modulesIndexed in here that may be as an index. Let's see what the queryPlanner will think. In fact, it's doing an index scan. If we run this, you can see that this is fast. If we run the previous one, you can see that it is actually not fast. What we can do now is we can use this collection instead of the previous one. We move to this one, which is using modulesIndexed instead.

Let's check if this solved our problem. Let’s see with two connections if our system has improved. It takes some time, five seconds. Now, we can do 68 requests per second. This is a bit better, probably not what you'd expect. Maybe there's something else. Maybe it's just because you're doing two connections per second, that's what we are doing. That's our problem. Let's ramp it up. Let's see if we can throw more resources at it, that it will work. It went up a little bit, but also the latency spiked.

There is some other problem, we have some other issue on this piece of code. We can go back and use Doctor to tell us what changed. We can run it in the exact same way we were running it before, inject the probes, run the load test, and then generate our HTML report. Now, it detects something else, it detects there is an event loop issue. This is because we're actually doing more requests than before. You can see that now that event loop is blocking for a significant amount of time, while before, if you look closely, in fact, it was spiking and blocking a little bit, but that was not the big source of the problem. Right now, instead, you see that it's massively blocking. How is it possible that the CPU is spiking too close to more than 150%? You can see it here. Why this is happening? Why is that big spike over 150? What is that?

The reason for that is Node.js. You have been told for a long time that Node.js is a single-threaded system. This is true. JavaScript is single-threaded. On that runtime, that specific JavaScript program is executing only one function at a time. However, V8 and Node use threads for a lot of other stuff. For example, the Garbage Collector in V8 runs on a separate thread 80% of the time. There is a good blog post and talk about this from some friends of mine at Google, so you should check that out. It goes up because a lot of garbage has been allocated and you need to collect that. That's why it's spiking over.

What does Doctor recommend? Doctor recommends to use Clinic Flame. That's what you're going to use. You probably got the mood of this. We are generating the stuff, it takes some time to do the analysis. This is a nice feature here that my team has put in, called Presentation Mode. Hopefully, you will actually see this a bit better. You can see a couple of nice things here. You can see that there is a compute-magic function that is taking most of the time. Why does compute-magic take so much time? I don't know. Who put that in there? I don't know. You can see a bunch of stuff happening that's dominating this thing.

You can also check in, for example, V8. You can enable internal V8 details, so you can actually deepen what V8 is doing. You can do debug from V8 runtime, native, the developer expression of stuff, but there is also this button called Merge. Normally, by default, we merge optimize and non-optimized paths together, because that makes sense for people to utilize. Because you don't want to know when a function gets optimized, but in some cases you might want to disable that. You can see that the diagram gets a completely different shape, before and after being optimized. Note that V8 is going to give you a hint of one of the problems, which is JsonStringifier, SeralizeString. What could be the problem? How are we computing magic?

Going back to our function, it's on line 25 of my file. I'm opening up my editor, and going to line 25. Line 25, computeMagic, createHash, JSON.stringify is there for this editor. Who did this? Who is the engineer that put this in? This was me, so I'm sorry. Now, we have a much better alternative that doesn't have this problem. Because of this, let me see if I solved the problem by checking again with my load testers, so I can verify if my problem has been fixed. Here we go. Now it's probably better. You can even check with Doctor if the data matches what you would expect, and so on. There's, of course, a lot of room for optimization, still. I just wanted to show this to you. You can make the two queries in parallel, for example.

You can see that it's a way more healthier type of diagram, especially the event loop is not blocking. The CPU is going up and down nicely, and the memory used is doing this sawtooth example, which is very normal for Node.js systems. That's it. Let's go back to our slides.

The Event Loop

We have seen Clinic and we can ask the question now, "How can we improve the performance of Node.js applications?" You have been on this journey with me, and we have improved the performance of my application 50 times. It's a good result.

The key thing to understand before optimizing a Node.js system is the event loop. How does the Node.js event loop work? This is the Node.js event loop. Have you ever seen this diagram? Did you understand how that worked? When we wrote this piece of documentation, it took us six months to write the piece of documentation down. It's probably one of the little known, one secret piece of docs that's super important. The event loop goes down through phases, you get incoming connections, and you react to it. The most important part is the URL that explains all of this, so I'm just putting this down there.

This is a more high-level diagram of what that looks like. We have the Node.js system that schedules some activity, either on the kernel, async I/O barrier, or with kpoll, epoll, whatever you want to use, or on the thread pool. When those things finish, they go back into an event queue. From that event queue, then Node.js will execute that. What does an event look like? Well, this is an event in Node. From JavaScript, you add a function. You create a function that typically has a closure, and maybe some state with an object or something, for an I/O event. You say, "When that happens, call this function." Then the I/O event happens, and then that specific function is called. That's it, it's nothing more complicated than this. We receive data, we send data - that's what happens at the core level of the system.

You should ask a question. We have talked about I/O events, but what about promises? What about nextTicks? What about setImmediates, and timers, and all those stuff? Before that, there's no parallelism of function execution, so we execute a function only one at a time. That is a key part of a Node.js system - there's no parallelism. A lot of people seem to forget this in practice, when working with Node. Only one function runs at any given time. In Node 11, this current release of Node, we have Worker Threads. You might want to go check those out. Then, you can run multiple functions in parallel. On the matter of a single event loop, you have only one function that is running at any given time.

nextTick, Promises, setImmediate

Then we go to the problem of fake asynchronous activity, because this is fake asynchronous activity. What do you do if you need to do an API, and that API can be called synchronously or asynchronously? To respect the contract, you need to use something that creates some asynchronicity in your system. You have nextTicks. NextTicks is this thing that Node invented. We're good, we invented something. It's not a browser, it's just pure to Node, not really a brilliant piece of technology, overall. We keep changing the implementation because it's typically a big bottleneck of Node.js.

We do continuous performance optimization every three months on that library. When the new V8 version comes in, we tend to use the latest and the greatest for that. Then, nextTicks are always executed. When you receive an I/O event from the underlining operating system, nextTicks are always executed when that function finishes. When that function finishes, you execute all of the nextTick queue. As part of that, we also execute promises, but nextTicks come first. We typically execute nextTicks, and then we execute promises. Note that we execute promises synchronously, but sort of asynchronously. When you do new Promise(), that function is called immediately, but when it resolves, the promise will actually be resolved in the future.

Just to complicate things, we also have setImmediate, that is another thing that does not exist in the browser. Somebody said it did, but that's a different chapter of life. It exercises the same flow of the I/O event, so it exercises the full pipeline. setImmediate pass through all of this, while promises and nextTick stays within this. There is a complete difference when we are running a lot of promises and nextTicks in our application, because we are never receiving I/O events. We say that using promises or nextTick, you can actually starve the event loop. It keeps staying here, scheduling fake asynchronous activity, never exiting, and never asking the async I/O barrier, "Is there something that we can do?"

Going back to the original question of how we optimize a Node.js application, the hardest part, working with a lot of companies and clients over the years, it's knowing when a piece of code is executed versus when another piece of code is executed. What is the order? Can those two things always come in the same order? Are they swapped? Are they interleaved? How do my flows interact with each other? How does this reflect to my Garbage Collector? Because we have a Garbage Collector system. How does this impact our Garbage Collector? Which I didn't touch. This talk does not talk about garbage collection. It's another 50-minute talk. The Clinic.js can help you understand how your Node application works. That's the key tagline of this.

It's very important to know and reason about these things when we see a production issue. Let's show a quick example. You probably have seen Ryan Dahl's demo of Node.js for a long time. How many of you are using promises? How many of you know how promises work? There is a high chance that you're using them wrong. Just saying, that comes from my experience. A very simple mistake when you're dealing with promises is when you switch from a non-promise context to a promise, and you start a promise chain. If you're running a simple system in Node, in the context of, for example, a web server, you always need to put this second part. You need to react to errors, you need to check if the promise rejected. Why? Because otherwise, you will leak a file descriptor.

That was my problem back then. I was leaking a file descriptor. If I remove this thing, if my handler froze, my response will never be sent a response. I will never receive an answer, a response to my load balancer stuff, and that will timeout. Those file descriptor would linger in my system up until my operating system will clean them up. That, for some misconfiguration, for the default configuration, is like 50 minutes or something. That's what it is.

What we are doing here, we are using promisify to create some asynchronous activity. Then after that, we send a response. This is the Bubbleprof of my diagram. I'm actually going to open it up. Hey, it's working. What you can see here is that there's a lot of scheduling happening, the purple one, because we are using promises. There's a lot of scheduling activity happening, because that's what it is. We can actually open it up, and we can see that when we are receiving a promise, in fact, this is forking. Why is it forking? Well, because that's how promises work, to some extent.

You have a promise, and then you have something that you execute when the promise is executed. It's a fork of your main execution thread, of your main execution of your application. Even though you chained them together, in fact, the context model of this is a forking model. One goes nowhere. The other one is just the one that continues. Because you do something, then “then”, then “then”, then “then”, so all of those “then”-s are actually siblings, and that's what is happening. We see another example of this in a moment. This is our end promise.

Going back to the talk, next example, async/await. You see that you are awaiting here. You do the async something, and we do await, and we specify a catch handler to avoid leaking file descriptors when this throws. We are awaiting, and doing res. end() - that will work. Same code as before. This should generate the same diagram. Do you think it generates the same diagram? No.

Participant 1: Depends on the version of Node.js.

Collina: Depends on the version of Node.js. True, but the version that generates a different one has not been released yet.

Participant 1: True.

Collina: I had the wrong image. What you can see here is that the diagram has changed. The forking is actually happening, there are three forks. With a promise, we had two. Now, we have three, and even more. The reason for that is, in fact, when you're scheduling an async function, up until the latest changes in this patch that happened this year, a throwaway promise is creating under the hood to make async/await work. In fact, async/await also does a lot of ticks on the microtask queues internally, and that's what is being caused here, and that's why you're seeing a difference in the diagram.

From a user perspective, it doesn't change much. The moment you're adding a lot of awaits and async functions in your code, that might impact your latency. Because you're basically deferring a lot of computation in the future that is probably not needed at that point in time. It's worthwhile to know. . Let's go back to callbacks.

I love callbacks, I love async/await. I just like JavaScript. We are reading the same file three times and then concatenate the three copies. A completely useless piece of code, but it's very nice to see how that will be rendered. We can open this up, and basically you can see that there is a little bit of scheduling, true, but there is a lot of data happening. In fact, this isn't very nice to see, because this is Node 10. In Node 10, when we're reading a file, we started segmenting the reads of that file. You can open it up, and you can see it here. This is where we schedule the read of the file. We call a readFile. Then here, a bunch of things happen. You can see in order to read that file, we actually do three asynchronous activities in Node.js core. This is where it's happening. (I'll be late. I had the thought, "I didn't have enough material," so I have a lot of stuff.)

Let's do the same thing with async/await. You have seen it was very linear. Go back to this - this was very linear, one after the other. This is better to read, though. This is what is happening, it's completely different. Why? Because we have all of those throwaway promises being created, and we have our throwaway promise here, we have another nice throwaway promise under here, another one here, and another one here, because we are doing three awaits. We are doing three throwaway promises. Then we have a big promise here, which is my main async/await function, which has scheduled all this stuff.

Hopefully this will be sorted in Node 12, so this will probably be better. That's the current situation, and it's important to visualize this and to see the difference between one and the other. The end result, I'll go through this very quickly, with Promise.all() and three calls to readFile, so basically we are using promise only stat. With Promise.all(), we are getting a diagram that does more or less look like what we would expect. We only got one throwaway promise, because we call await only once. The other things are actually happening in parallel, which is what we would expect of things.

Performance Considerations

Let's conclude this on performance considerations. In Node.js, if you have a slow I/O operation, typically, your Node application increases the number of concurrent tasks. That's what's happening. Because I have a slow database query, my Node.js process is idle, so I'm accepting new TCP connection and I'm setting new HTTP requests. Because of that, the memory consumption of my system is going up. Why? We are allocating data structures to handle those requests, then we are issuing them to my database. My memory consumption is going up. What happens when the memory consumption goes up? It increases the amount of work that your garbage collection needs to do.

Under a high load, the Garbage Collector will steal precious CPU cycles from your JavaScript critical path. That's the problem, the end-line problem. In fact, in a lot of cases, you can see that the garbage collection is doing a lot of extra work, and your CPU on your system is running at 100%, or 200%, or whatever, because there is too much concurrency on your system, and it's allocating too much data. Your Node process is allocating too much data, which is actually very important to know. Unfortunately, latency and throughput are connected in Node.js. You wouldn't expect this, but under high stressful conditions, those two things are connected. Under severe high load, those two things are very important, so it's very important to know they're connected.


If you're doing any performance work, set quantifiable performance goals. Making things faster does not make any sense at all, so don't do this. Don't try, " I want to optimize a piece of code." No. Check, "Is this code fast enough?" Only if you're working on Node core. There's a bunch of people that are enjoying trying to squeeze some few percentage points out of a benchmark, but that's the only thing they're getting out of it - squeezing percentage points out of a benchmark. I'm doing that all the time, so I'm talking about myself. If you're using Node, you may want to choose fast libraries. One is Pino, it's a high-speed logging library. The other one is Fastify, it's a high-speed web framework.

Beware of the rabbit hole, because performance work takes forever. It's an unbounded task, so if you don't put an escape patch, you can keep optimizing your things forever. You might have other activities. Remember that you don't need to go ludicrous speed, if you have seen the stuff.

Questions and Answers

Moderator: Do I really have to balance business goals with speed?

Collina: Yes, you do. Unfortunately, very unfortunately, you do. You've got to do that. The point is, it's an unbounded activity. The most precious thing on a software development team is developer time. If you don't have a problem in production, don't over-optimize, don't spend time. If you're building a product and one of your key characteristics and key features is performance and speed, then you want to spend the time. Most of the time, you don't -

Participant 2: What kind of magic are you using under the hood to instrument your apps, Async Hooks?

Collina: Async Hooks. With Bubbleprof, it's all based on Async Hooks, which is C library. It's a part of Node core, all based on C stuff, to be able to track the asynchronous context from one call to the other.

Participant 2: My follow up question to that is, how hard would it be to make my own simple version of that myself? On a scale of 1 to 100.

Collina: Sorry. Depends on which environment. Because the point is if you're talking about the browser, or you're talking about Node.js, or implementing it yourself on top of Node.

Participant 2: To use Async Hooks and Node, is there something I could build a simple version of this on my own code, or should I really just use this?

Collina: You should really use it, because Async Hooks is implemented in C and C++. You need to go down at the runtime level and provide that functionality where the asynchronous call is being implemented.

Participant 2: So it's a pain to build.

Collina: It took us two years to ship, and then it's still experimental. Consider it. That is the key end part of that thing.

Participant 3: Does Clinic work on post-mortem data? Essentially, you demoed that live, always working within your autocannon type thing. So does it work post-mortem?

Collina: Yes and no. The answer is, we need to instrument the Node process. If we don't have the data, we can't do anything about it. In order to have the data, there is some flags that you can pass to Clinic. You can actually gather that data, store them on disk, and then do the analysis and the rendering on another machine. That is feasible. Of course, it doesn't work on post-mortem debugging scenario, to some extent.


See more presentations with transcripts


Recorded at:

Jul 12, 2019