Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Load Testing APIs and Websites with Gatling: It’s Never Too Late to Get Started

Load Testing APIs and Websites with Gatling: It’s Never Too Late to Get Started

This item in japanese

Key Takeaways

  • Conducting load tests against APIs and websites can both validate performance after a long stretch of development and get useful feedback from an application in order to increase its scaling capabilities and performance.
  • Engineers should avoid creating “the cathedral” of load testing and end up with little time to improve performance overall. Write the simplest possible test and iterate from there.
  • Gatling can be used to conduct stress tests, soak tests, and capacity tests. It is also possible to write Give-When-Then style tests.
  • When analyzing results, engineers must examine percentile results, rather than focusing on averages.
  • It is important to establish the goals, constraints, and conditions of any load test. Always identify and verify any assumptions, e.g. a user’s default device, network speed, the type of server an application will be running on production.

You open your software engineering logbook and start writing—“Research and Development Team’s log. Never could we have foreseen such a failure with our application and frankly, I don’t get it. We had it all; it was a masterpiece! Tests were green, metrics were good, users were happy, and yet when the traffic arrived in droves, we still failed.”

Pausing to take a deep breath, you continue—“We thought we were prepared; we thought we had the hang of running a popular site. How did we do? Not even close. We initially thought that a link on TechCrunch wouldn’t bring that much traffic. We also thought we could handle the spike in load after our TV spot ran. What was the point of all this testing if the application stopped responding right after launch, and we couldn’t fix this regardless of the number of servers we threw at it, trying to salvage the situation as we could.”

After a long pause, “It’s already late, time goes by. There is so much left to tell you, so many things I wish I knew, so many mistakes I wish I never made.” Unstoppable, you write, “If only there was something we could have done to make it work...”

And then, lucidity striking back at you—you wonder: why do we even do load testing in the first place? Is it to validate performance after a long stretch of development or is it really to get useful feedback from our application and increase its scaling capabilities and performance? In both cases validation takes place, but in the latter, getting feedback is at the center of the process.

You see, this isn’t really about load testing per se, the goal is focused on making sure your application won’t crash when it goes live. You don’t want to be writing “the cathedral” of load testing and end up with little time to improve performance overall. Often, working on improvements is where you want to spend most if not all of your time. Load testing is just a means to an end and nothing else. 

If you are afraid because your deadline is tomorrow and you are looking for a quick win, then welcome, this is the article for you.

Writing the simplest possible simulation

In this article, we will be writing a bit of code, Scala code to be more precise as we will be using Gatling. While the code might look scary, I can assure it is not. In fact, Scala can be left aside for the moment and we can think about Gatling as its own language:

import io.gatling.core.Predef._
import io.gatling.http.Predef._

class SimplestPossibleSimulation extends Simulation {

  val baseHttpProtocol =

  val scn = scenario("simplest")


Let’s break this down into smaller parts. As Gatling is built using Scala, itself running on the Java Virtual Machine, there are some similarities to expect. The first one is that the code always starts with package imports:

import io.gatling.core.Predef._
import io.gatling.http.Predef._

They contain all the Gatling language definitions and you’ll use these all the time, along with other imports, depending on the need. Then, you create enough to encapsulate the simulation:

class SimplestPossibleSimulation extends Simulation {}

SimplestPossibleSimulation is the name of the simulation and Simulation is a construct defined by Gatling that you “extend” and will contain all the simulation code, in three parts: (1) we need to define which protocol we want to use and which parameters we want it to have:

val baseHttpProtocol =

(2) The code of the scenario itself:

val scn = scenario("simplest")

Notice I used the term scenario even though we only spoke about simulations until now. While they are often conflated, they have both really distinct meanings:

  • scenario describes the journey a single user performs on the application, navigating from page to page, endpoint to endpoint, and so on, depending on the type of application
  • simulation is the definition of a complete test with populations (e.g.: admins and users) assigned to their scenario
  • When you launch a simulation, the result is called a run

This terminology helps make sense of the final section, (3) setting up the simulation:


Since a simulation is roughly a collection of scenarios, it is the sum of all the scenarios configured with their own injection profile (which we will ignore for now). The protocol on which is based all the requests done in a scenario can be either shared, if chained after setUp, or defined per scenario, if we had multiple ones, scn1 and scn2, as such:


Notice how everything is chained, separated by a comma: scn1 configured with such injection profile and such protocol, etc.

When you run a Gatling simulation, it will launch all the scenarios configured inside at the same time, which is what we’ll do right now.

Running a simulation

One way to run a Gatling simulation is by using the Gatling Open-Source bundle. It is a self-contained project with a folder to drop simulation files inside and a script to run them.



All installations require a proper Java installation, JDK 8 minimum, see the Gatling installation docs for all the details.

After unzipping the bundle, you can create a file named SimplestPossibleSimulation.scala inside the user-files/simulations folder. Then, from the command line, at the root of the bundle folder, type:


Or, on Windows:


This will scan simulations inside the previous folder and prompt you, asking which one you want to run. As the bundle contains some examples, you’ll have more than one to choose from.

While this is the easiest way to run a Gatling simulation, it doesn’t work well with source repositories. The preferred way is to use a build tool, such as Maven, or SBT. If this is the solution you would like to try, you can clone our demo repositories using git:

Or, if you want to follow along with this article, you can clone the following repository which was made for the occasion: 

After running the test, you notice there is a URL at the end of the output, which you open and stumble upon:


You will notice there is a request named “Home Redirect 1” which we didn’t make ourselves. You might be thinking it’s probably Gatling following a redirect automatically and separating the response times from all the different queries. But it does look kind of underwhelming, and you would be right.

Configuring the Injection Profile

Looking back, when configuring our scenario we did the following:


Which is great for debugging purposes, but not really thrilling in relation to load testing. The thing is, there is no right or wrong way to write an injection profile, but first things first.

An injection profile is a chain of rules, executed in the provided order, that describes at which rate you want your users to start their own scenario. While you might not know the exact profile you must write, you almost always have an idea of the expected behavior. This is because you are typically either anticipating a lot of incoming users at a specific point in time, or you are expecting more users to come as your business grows. 

Questions to think about include: do you expect users to arrive all at the same time? This would be the case if you intend to offer a flash sale or your website will appear on TV soon; and do you have an idea of the pattern your users will follow? It could be that users arrive at specific hours, are spread across the day, or appear only within working hours, etc. This knowledge will give you an angle of attack, which we call a type of test. I will present three of them.

Stress testing

When we think about load testing, often we think about “stress testing,” which it turns out is only a single type of test; “Flash sales” is the underlying meaning.

The idea is simple: lots of users, the smallest amount of time possible. atOnceUsers is perfect for that, with some caveats:


It is difficult to say how many users you can launch at the same time without being too demanding on hardware. There are a lot of possible limitations: CPU, memory, bandwidth, number of connections the Linux kernel can open, number of available sockets on the machine, etc. Is it easy to strain hardware by putting too high a value and ending up with nonsensical results?

This is where you could need multiple machines to run your test. To give an example, a Linux kernel, if optimized properly, can easily open 5k connections every second, 10k being already too much.

Splitting 10 thousand users over the span of a minute would still be considered a stress test, because of how strenuous the time constraint is:

  rampUsers(10000) during 1.minute

Soak test

If the time span gets too long, the user journey will start feeling more familiar. Think “users per day,” “users per week,” and so on. However, imitating how users arrive along a day is not the purpose of a soak test, just the consequence.

When soak testing, what you want to test is the system behavior under a long stretch of time: how does the CPU behave? Can we see any memory leaks? How do the disks behave? The network?

A way of doing this is to model users arriving over a long period of time:

  rampUsers(10000000) during 10.hours  // ~277 users per sec

This will feel like doing “users per day.” Still, if modeling your analytics is your goal, then computing the number of users per second that your ramp-up is doing and reducing the duration would give you results faster:

  constantUsersPerSec(300) during 10.minutes

Speaking about this, rampUsers over constantUserPerSec is just a matter of taste. The former gives you an easy overview of the total users arriving while the latter is more about throughput. However,  thinking about throughput makes it easier to do “ramping up,” i.e. progressively arriving at the final destination:

  rampUsersPerSec(1) to 300 during 10.minutes,
  constantUsersPerSec(300) during 2.hours

Capacity test

Finally, you could simply be testing how much throughput your system can handle. In which case a capacity test is the way to go. Combining methods from the previous test, the idea is to level the throughput from some arbitrary time, increase the load, level again, and continue until everything goes down and we get a limit. Something like:

  constantUsersPerSec(10) during 5.minutes,
  rampUsersPerSec(10) to 20 during 30.seconds,
  constantUsersPerSec(20) during 5.minutes,
  rampUsersPerSec(20) to 30 during 30.seconds,

You could say doing this 20 times could be a bit cumbersome…but as the base of Gatling is code, you could either make a loop that generates the previous injection profile, or use our DSL dedicated to capacity testing:

    .separatedByRampsLasting(30.seconds) // optional
    .startingFrom(10) // users per sec too!

Here is how to model this graphically:

And that’s it!

Where to start with loading testing?

If it is your first time load testing, whether you already know the target user behavior or not, you should start with a capacity test. Stress testing is useful but analyzing the metrics is really tricky under such a load. Since everything is failing at the same time, it makes the task difficult, even impossible. Capacity testing offers the luxury to go slowly to failure, which is more comfortable for the first analysis.

To get started, just run a capacity test that makes your application crash as soon as possible. You only need to add complexity to the scenario when everything seems to run smoothly.

Then, you need to look at the metrics:

What do all of these results even mean?

The above chart shows response time percentiles. When load testing, we could be tempted to use averages to analyze the global response time and that would be error-prone. If an average can give you a quick overview of what happened in a run, it will hide under the rug all the things you actually want to look at. This is where percentiles come in handy.

Think of it this way: if the average response time is some amount of milliseconds, how does the experience feel in the worst case for 1% of your user base? Better or worse? How does it feel for 0.1% of your users? And so on, getting closer and closer to zero. The higher the amount of users and requests, the closer you’ll need to get to zero in order to study extreme behaviors. To give you an example, if you had 3 million users performing a single request on your application, and 0.01% of them timed-out, that would be 30 thousand users that weren’t able to access your application.

Percentiles are usually used, which correspond to thinking about this the other way around. The 1% worse case for users is turned into “how does the experience feel at best for 99% of the users,” 0.1% is turned into 99.9%, etc. A way to model the computation is to sort all the response times in ascending order and mark spots:

  • 0% of the responses time is the minimum, marking the lowest value
  • 100% is the maximum
  • 50% is the median

From here, you go to 99% and as close to 100% as possible, by adding nines, depending on how many users you have.

In the previous chart, we had a 99 percentile of 63 milliseconds, but is that good or not? With the maximum being 65, it would seem so. Would it be better if it were10 milliseconds, though?

Most of the time metrics are contextual and don’t have any meaning by themselves. Broadly speaking, a 10ms response time on localhost isn’t an achievement, and it would be impossible from Paris to Australia due to the speed of light constraint. We have to ask, “in what conditions was the run actually performed?” This will help us greatly deduce whether or not the run was actually that good. These conditions include:

  • What is the type of server the application is running?
  • Where is it located?
  • What is the application doing?
  • Is it under a network?
  • Does it have TLS?
  • What is the scenario doing?
  • How do you expect the application to behave?
  • Does everything run on the same cloud provider in the same data center?
  • Are there any kinds of latency to be expected? Think mobile (3G), long distance.
  • Etc.

This is the most important part. If you know the running conditions of a test, you can do wonders. You can (and should) test locally, with everything running on your computer, as long as you understand what it boils down to: no inference can be made on how it will run in production, but that allows you to do regression testing. Just compare the metrics between multiple runs, and it will tell you whether you made the performance better or worse.

You don’t need a full-scale testing environment to do that. If you know your end-users are mobile users, testing with all machines located in a single data center could lead to a disaster. However, you don’t need to be 100% realistic either. Just keep in mind what it implies and what you can deduce from the testing conditions: a data center being a huge local network, results can’t be compared to mobile networks, and observed results would in fact be way better than reality.

Specifying the need

Not only should you be aware of the conditions under which the tests are run, but you should also decide beforehand what makes the test a success or a failure. To do that, we define criteria, as such:

  1. Mean response time under 250ms
  2. 2000 active users
  3. Less than 1% failed requests

There is a catch though. Out of these three acceptance criteria, only one is actually useful to describe a system under load, can you guess which and what they can be replaced with? 

(1) “Mean response time under 250ms”: It should come naturally from the previous section that while average isn’t bad, it isn’t sufficient to describe user behavior, it should always be seconded with percentiles:

  • Mean response time under 250ms
  • 99 percentile under 250ms
  • Max under 1000ms

(2) “2000 active users”: This one is more tricky. Nowadays we are flooded with analytics showing us “active users” and such, so it should be tempting to define acceptance criteria using this measurement. That is an issue though. The only way to model an amount of active users directly is by creating a closed model. Here, you will end up with a queue of users, and this is where the issue lies. Just think about it. If your application were to slow down and users are on a queue, they would start piling up in the queue, but only a small amount, (the maximum amount allowed in the application), would be “active” and performing requests. The amount of active users would stay the same, but users would be waiting outside as they would be in a shop. 

In real life, if a website goes down, people will continue to refresh the page until something comes up, which will further worsen the performance of the website until people give up and you lose them. You shouldn’t model active users unless it is your use case—you should instead target an amount of active users. It can be difficult, but doable. It will depend on: the duration of the scenario, including response time, pauses, etc., the network, the injection profile, etc.

This is what you could measure instead:

  • 2000 active users
  • Between 100 and 200 new users per second
  • More than 100 requests per second

(3) “Less than 1% failed requests” was in fact the only criterion that properly represents a system under load between the three. However, it is not to be taken as a rule of thumb. Depending on the use case, 1% may be too high. Think of an e-commerce site, you might allow some pages to fail here and there, but having a failure at the end of the conversion funnel right before buying would be fatal to your business model. You would have this specific request with a failure criterion at 0% and the rest to either a higher percentage or no criterion at all.

All of this leads to a specification based on the Given-When-Then model and how to think about load testing in general, with everything we learned so far:

  • Given: Injection Profile
  • When: Scenario
  • Then: Acceptance Criteria

Using the previous examples, it can look like this:

  • Given: a load of 200 users per second
  • When: users visit the home page
  • Then:
    • We got at least 100 requests/seconds
    • 99 percentile under 250ms
    • And less than 1% failed requests

Finally, the Given-When-Then model can be fully integrated as a Gatling simulation code:

import io.gatling.core.Predef._
import io.gatling.http.Predef._

import scala.concurrent.duration._

class FullySpecifiedSimulation extends Simulation {

 val baseHttpProtocol =

 // When 
 val scn = scenario("simplest")

 // Given
     rampUsersPerSec(1) to 200 during 1.minute,
     constantUsersPerSec(200) during 9.minutes
 ).protocols(baseHttpProtocol) // Then

The Scenario (When) part didn’t change (yet), but you will need the Injection Profile (Given) at the same place, and a new part, called assertions, as Then. What assertions does is computing metrics from within all the simulation, and will fail the “build” if they are under/over the requirements. Using a Maven project, for example, you’ll see:

[INFO] ------------------------------------------------------------
[INFO] ------------------------------------------------------------
[INFO] Total time:  07:07 min
[INFO] Finished at: 2020-08-27T20:52:52+02:00
[INFO] ------------------------------------------------------------

And BUILD FAILURE otherwise.  This is handy when using a CI tool such as Jenkins, for which we provide a Gatling Jenkins plugin. Using it, you can configure the running of simulations right into your CI, and get notified if your acceptance criteria are failed and by how much.

Note that in the previous example we used the keyword global as a starting point, but you could be as precise as having a single request having no failures and ignore all the others:


You can find more examples in our assertions documentation.

Defining a test protocol

If you believe that a specific variable, for example, the network, is the culprit behind the poor performance, you will need a proper testing protocol to test your assumptions. Each variable should be tested against a “witness.” If you suspect “something” is the cause, test it with and without, then compare. Make a baseline and make small variations, then test against this.


Additional tooling will help catch the essentials that Gatling can’t. As all metrics given by Gatling are from the point of view of the user, you’ll need to make sure you are equipped with system and application monitoring on the other side of the spectrum. System monitoring is also very useful to have on the Gatling side. It will help you see if you are using too many resources on the machine on a huge test with information such as CPU, memory, network connections, and their states, networking issues, etc.

There are many popular metrics collection solutions out there, such as Prometheus combined with Node Exporter. These can be used alongside Grafana combined with an appropriate dashboard. Let it run and collect the metrics all the time. Check when running a test on the time window of the test.

Equipped with such tools, the test protocol will boil down to:

  1. Ensure system and application monitoring is enabled
  2. Run a load test (capacity, for example)
  3. Analyze and deduce a limit
  4. Tune
  5. Try again

Going beyond

Closing your logbook, you realize you weren’t that far off succeeding after all. Rather than building a full-fledged load testing Cathedral, you decide to go one small step at a time and understanding that knowledge is key. The sooner you have the information you need, the better.

From now, when everything seems to work fine, you will have the possibility to add complexity to the scenario, approaching more and more the way your users actually perform actions on your application. Your most helpful resources will be the Gatling OSS documentation, along with:

  • The Quickstart, so you can learn how to record this user journey without writing much code right at the beginning
  • The Advanced Tutorial to go deeper
  • The Cheat-Sheet that lists all the Gatling language keywords and their usage

About the Author

Guillaume Corré works as a software engineer, consultant and Tech Lead at Gatling Corp, based in Paris at Station F, world's biggest startup campus. Swiss army knife by nature, he enjoys simple things such as data viz, optimization and crashing production environments using Gatling, the best developer tool to load test your application, preventing applications and websites from becoming victims of their own success and help them face critical situations, like go-lives or Black Fridays.

Rate this Article