BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Gatling vs JMeter - What to Use for Performance Testing

Gatling vs JMeter - What to Use for Performance Testing

This item in japanese

Lire ce contenu en français

Bookmarks

Key Takeaways

  • Performance tools with a GUI might be problematic in long-term usage.
  • Readability and composability are the keys to long-lived performance tests.
  • Always check how performance metrics are implemented under the hood.
  • Performance testing as code is a promising alternative to the GUI-based approach.

JMeter and Gatling are one of the most popular performance testing tools. There's already a lot of content comparing these two projects out there - so why write another article about it? I'll try to compare the two tools from a slightly different angle. I've been using both for quite some time and I think it's time to summarize my experiences.

What Should Performance Testing Look Like in a Perfect World?

From my perspective as a developer, testing the performance of an application is the responsibility of the developer who built it. Thus, an approach in which a dedicated team of testers who have never touched the source code of the application and must check its performance - will not work. I don't mean that testers can't create good performance tests. Rather, I want to emphasize that if we have performance testers at our disposal, the application testing process must be carried out in close cooperation between testers and developers, especially at the very beginning. 

Over time, more and more responsibility may be transferred to the testing team, but developers will still be needed to analyze the results. This is especially true in a situation when the results are different from what we originally expected. Of course, this is very good for the developers because it creates a feedback loop regarding the quality of their solutions, similarly as in the case of writing unit, integration, or End-to-End (E2E) tests.

Performance tests are undoubtedly some of the most expensive tests in software development. In addition to the valuable tester or developer time to create them, they also require a dedicated (if possible) environment to conduct such tests. Applications change very dynamically, and thus performance tests should also track these changes and be kept up to date. This is sometimes even more expensive than the original creation of such tests.

For the above reasons, my first piece of advice when it comes to performance testing is to think about the whole process in a long-term context. Ideally, performance tests should be part of our Continuous Deployment (CD) cycle. This is not always possible and does not always make sense. However, from my observations, it appears that taking shortcuts at the beginning of the game with performance testing may cost us a lot in the future.

How to Choose Tools for Performance Tests?

Choosing a tool for performance testing will be one of the first dilemmas and I hope this article will help you to make the right choice. 

Again, from a developer's perspective, you should expect four main attributes from a good performance testing tool:

  1. readability,
  2. composability,
  3. correct math,
  4. distributed load generation. 

You may ask, “That's it?” To be honest - yes. These are the most important aspects when choosing a tool. Of course, there are a lot of specific and useful functionalities, although most tools provide a more or less similar set of options when it comes to creating test scenarios, simulating production traffic, etc. That's why I want to focus on these points to show what is really important when choosing a tool that will be both convenient and maintainable. 

If a tool meets these four basic requirements, only then can we proceed to the analysis of the full spectrum of its functionality. Otherwise, tempted by some interesting function that in the long run will be neither so interesting nor so useful, we will end up with a suboptimal test tool.

Test Readability

Ok, let's start with the first point. Comparing the readability of the application with the GUI vs the source code is a rather unusual approach, but let's see what comes out of it. A very simple business flow in JMeter may look as follows:

At first glance - quite good. Everything is clear and it's easy to understand what a given test is testing. Unfortunately, if we start to expand our scenario to add new steps, parameterize the current ones or change their behavior, then very soon, we will come to the conclusion that this is very tedious work. You have to use the mouse a lot, know what's hidden where, and what is worse, remember about all implicit connections (e.g. shared variables) between individual queries. 

In my experience, sooner or later, editing from the GUI will cease to be pleasant and we will switch to the source code that is in XML and (typically for XML) is completely unreadable. In XML, we can at least use basic refactoring techniques based on strings, such as "replace with", etc.


<?xml version="1.0" encoding="UTF-8"?>
<jmeterTestPlan version="1.2" properties="5.0" jmeter="5.3"></jmeterTestPlan>
  <hashTree></hashTree>
    <TestPlan guiclass="TestPlanGui" testclass="TestPlan" testname="Test Plan" enabled="true"></TestPlan>
      <stringProp name="TestPlan.comments"></stringProp>
      <boolProp name="TestPlan.functional_mode">false</boolProp>
      <boolProp name="TestPlan.tearDown_on_shutdown">True</boolProp>
      <boolProp name="TestPlan.serialize_threadgroups">false</boolProp>
      <elementProp name="TestPlan.user_defined_variables" elementtype="Arguments" guiclass="ArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true"></elementProp>
        <collectionProp name="Arguments.arguments"></collectionProp>
      </elementProp>
      <stringProp name="TestPlan.user_define_classpath"></stringProp>
      ...
	  That's a very, very long XML.
      ...

The complete XML code is available here.

The exact same scenario, in Gatling, looks like this:


  val scn =
    scenario("Example scenario")
      .exec(http("go to main page").get("/"))
      .exec(http("find computer").get("/computers?f=macbook"))
      .exec(http("edit computer").get("/computers/6"))
      .exec(http("go to main page").get("/"))
      .repeat(4, "page") {
        exec(http("go to page").get("/computers?p=${page}"))
      }
      .exec(http("go to create new computer page").get("/computers/new"))
      .exec(
        http("create new computer")
          .post("/computers")
          .formParam("name", "Beautiful Computer")
          .formParam("introduced", "2012-05-30")
          .formParam("discontinued", "")
          .formParam("company", "37")
      )

It's very similar to JMeter, i.e. you can see what is being tested and what the overall flow is. Let's not forget that this is, however, source code that is readable almost like normal sentences. The holy grail of all programmers. What should immediately come to mind is that if we are dealing with source code, we can use all known refactoring methods to expand the scenario or improve its readability. 

Not to mention that Scala (used in Gatling) is a highly typed language, so most of the problems with the correct construction of scenarios will be caught already at the stage of compiling the code. In JMeter, errors will appear only when the scenario is launched, which definitely slows down the feedback loop with the results.

Another argument in favor of source code is that it is very easy to version such tests and have them reviewed by many programmers, even from different teams. Good luck with that if you have to do it with thousands of lines of XML. 

Composability

Composability is crucial if we need to create multiple performance tests that share some logic, e.g. authentication, user creation, etc. In JMeter, we can create a copy-paste disaster very quickly. Even in this simple test plan, we have fragments that are repeated:

Over time, there will be more such places. Not only individual requests, but entire sections of business logic will be duplicated. You can address this problem by using Module Controller, or creating your own extensions in Groovy or BeanShell. From my experience, it's quite inconvenient and error-prone.

In Gatling, building reusable fragments is basically limited only by our programming skills. The first step may be to extract some methods so that they can be used multiple times.


  private val goToMainPage = http("go to main page").get("/")

  private def findComputer(name: String) = http("find computer").get(s"/computers?f=${name}")

  private def editComputer(id: Int) = http("edit computer").get(s"/computers/${id}")

  private def goToPage(page: Int) = http("go to page").get(s"/computers?p=${page}")

  private val goToCreateNewComputerPage = http("go to create new computer page").get("/computers/new")

  private def createNewComputer(name: String) =
    http("create new computer")
      .post("/computers")
      .formParam("name", name)
      .formParam("introduced", "2012-05-30")
      .formParam("discontinued", "")
      .formParam("company", "37")

  val scn =
    scenario("Example scenario")
      .exec(goToMainPage)
      .exec(findComputer("macbook"))
      .exec(editComputer(6))
      .exec(goToMainPage)
      .exec(goToPage(1))
      .exec(goToPage(1))
      .exec(goToPage(3))
      .exec(goToPage(10))
      .exec(goToCreateNewComputerPage)
      .exec(createNewComputer("Awesome computer"))

Next, we can divide our scenario into smaller fragments, then combine and create a more complicated business flow.

 
  val search = exec(goToMainPage)
    .exec(findComputer("macbook"))
    .exec(editComputer(6))

  val jumpBetweenPages = exec(goToPage(1))
    .exec(goToPage(1))
    .exec(goToPage(3))
    .exec(goToPage(10))

  val addComputer = exec(goToMainPage)
    .exec(goToCreateNewComputerPage)
    .exec(createNewComputer("Awesome computer"))

  val scn =
    scenario("Example scenario")
      .exec(search, jumpBetweenPages, addComputer)

If we have to maintain performance tests in the long run, then undoubtedly a high level of composability will be an advantage over other tools. From my observations, it appears that only tools that allow writing tests in the source code, e.g. Gatling and Scala, Locust and Python, WRK2 and Lua, meet this criterion. If the tests are saved in the form of text formats such as XML, JSON, etc., we will always be limited by the composability of these formats.

Correct Math

There's a saying that every performance tester should know: "lies, damn lies, and statistics." If they don't know it yet, they will surely learn it in a painful way. A separate article could be written about why this sentence should be the mantra in the performance test area. In a nutshell: median, arithmetic mean, standard deviation are completely useless metrics in this field (you can use them only as an additional insight). You can get more detail on that in this great presentation by Gil Tene, CTO and co-founder at Azul. Thus, if the performance testing tool only provides this static data, it can be thrown right away. 

The only meaningful metrics to measure and to compare performance are the percentiles. However, you should also use them with some suspicion about how they were implemented. Very often the implementation is based on the arithmetic mean and standard deviation, which, of course, makes them equally useless. 

From the presentation above you can learn how to verify the correctness of percentiles.

Another approach would be to check the source code of implementation yourself. I regret that most of the performance test tools documentation does not cover how percentiles are calculated. Even if such documentation exists, few people will use it and thus may fall into some traps e.g. Dropwizard Metrics implementation.

Without correct math/statistics, all our work in the context of performance tests can be completely worthless, because we will not be able to understand the results of a single test or compare the results with each other. 

In my tests, I very often rely on graphs with percentiles changing over time, which is possible to obtain in both Gatling and JMeter. Thanks to this, we are able to say whether the tested system does not have any performance hiccups during the entire test. 

To compare the results of individual tests, you need global percentiles (available in both tools). However, I once bounced back from a pretty interesting problem with the accuracy of global percentiles in JMeter. Gatling, in its implementation, uses the HdrHistogram library to calculate the percentile, which offers a very reasonable compromise between accuracy and memory requirements.

Distributed Tests

There are some articles about the performance of performance testing tools. This may be important up to a certain level, because we undoubtedly want to generate huge traffic to properly "push" the tested system. The problem is that current applications are very rarely single instances running on a single machine. We are dealing with distributed systems operating on many instances and many machines (often in dynamically scalable cloud solutions). A single machine running a performance test will not be able to generate enough load to test such an environment. Therefore, instead of focusing on which tool will generate more traffic from a single machine, it is better to check whether there is an option to run distributed tests from many machines, at the same time.

In this case, we have a draw. We can disperse our tests manually in Gatling as well as in JMeter. In addition, we can use existing solutions that will do it for us automatically like Flood or Gatling Enterprise. I definitely recommend the second option, because it will save us a lot of valuable time.

Summary

Although the general tone of this article could be perceived as a roast on JMeter, this was not my intention as I've used both of these tools. I used to see JMeter as the only sensible tool for testing performance, but when I started using Gatling, I didn't see a point in going back to JMeter.

A tool with a graphical interface will probably be easier to use at the very beginning, but the idea of a performance test as code - definitely appeals to me more. Not to mention that Gatling's DSL is really pleasant and convenient to use. Tests are readable and much easier to maintain. 

Many people are skeptical about Gatling because it requires learning a new programming language - Scala, which has been perceived as a difficult language and hard to use. Nothing could be further from the truth, though. The language has its pros and cons, while in the context of Gatling, only basic knowledge of syntax is required. If, on the other hand, you've always wanted to use Scala in your work, but it wasn't possible for a variety of reasons, perhaps performance (and automatic) tests will be a good way to gently introduce this language into your ecosystem. Be aware that since Gatling 3.7 you can use it with Java! That will be the topic of my next article. Stay tuned.

About the Author

Rate this Article

Adoption
Style

BT