Shift Left Performance Testing - a Different Approach
Embarking on a global IT Transformation program is both exciting and challenging for an organisation. Reducing the risk of failure is one of the key objectives throughout such a program. Testing is one of those activities that will significantly help to reduce the risk of failure.
This article will explain a different approach to traditional Multi User Performance testing; using the same tools but combine them with modern data visualisation techniques to gain early insight into location specific performance and application areas that may have "sleeping" performance issues.
Most programs concentrate first on functionality and second on anything else. Multi User Performance Testing, performed with tools like HP LoadRunner or Neotys Neoload, usually is one of those activities that happen late in the testing cycle. Many times this happens in parallel with User Acceptance Testing when the new system is already exposed to the end users. The chances to encounter a performance issue during Performance Testing are high; the opportunities to fix them are low as there is usually little time left before go live.
The reasons why Multi User Performance Testing usually happens late in a project are multifold and vary from project to project. The following reasons are two of the most common:
- Multi User Performance Testing requires tools and automation. The effort required to automate performance test scripts is only viable just before Performance Testing is about to start, when the application is in a "stable" state of development. Assuming that automating a Performance Test script would take on average two days, having a larger number of test scripts (i.e. 10-20) would require a significant amount of time and/or human resources. Changes to the application are not permitted during this phase as they could jeopardise all of the Performance Test script development effort expended up to that point. As a result, any delays during the development phases or a high number of significant changes that needs to be developed push out Performance Testing further to the right in the timeline and put at risk to be executed and provide value.
- Multi User Performance Testing requires a production like environment. Especially during IT Transformation programs, such an environment will not necessarily exist at the beginning of the program and requires time, effort and money to build. This competes with setting up all the other required test environments for earlier test phases and "getting things done" on the functional side. The latest moment that a production like environment is absolutely required for a program is during deployment testing (or dress rehearsals). By then it will be too late for Performance Testing to have a positive impact on the outcome of the program.
As a risk mitigation activity Performance Testing is an absolute necessity. The challenge is how to do it differently, so it can be done earlier where there is more time and budget left to fix any performance related issues. Performance Testing should also be done more often during the testing life cycle and co-ordinated with major release cycles, therefore providing a better return on investment and continuously helping to reduce the risk of failure for a program.
The following objectives are probably similar to the objectives of most IT Transformation programs:
- Modernise the global IT to pave the way for future innovation by upgrading the ERP and CRM applications to the latest versions
- Rationalise the application landscape to reduce costs and complexities
- Centralise IT infrastructure to save costs and improve flexibility
The approach explained is this article was used during an IT Transformation program which was running in a medium sized organisation that operates on a global level with an ever expanding number of local offices around the world. Business is conducted within the local offices, using standard Customer Relations Management (CRM) and Enterprise Resource Planning (ERP) applications, dealing face to face with its customers in the local offices.
The key common attribute to both applications is the need to display a lot of data to the users whilst executing the business critical processes. In addition, these customer facing business processes also require large sets of data (hundreds of records per business process) to be entered and then later processed on top of each other. Testing the business processes using large datasets was planned to happen only during User Acceptance Testing (UAT). The main reason for this was that it was too time consuming to enter the data manually to cover all the required test cases in the time available. The Data Migration work stream was earmarked to deliver this data for UAT, however, due to delays, this was not expected to be available until UAT was to begin.
The previous versions of the CRM solution had reported performance issues. This application was already centralised and used by many locations around the world. These issues varied from location to location and spread accross different parts of the functionality. These issues were reported by end users and no specific testing or timings were available.
The ERP solution previously had multiple versions installed standalone in different regions or countries. There were no reported performance issues. There was a concern around future performance running both CRM and ERP applications from a centralised especially on the customer facing activities like data entry, scanning and printing.
Scalability testing which is one key aspect of Multi User Performance Testing was not so much of a concern to the program. Market leading standard software was implemented with as little customisation as possible and a correctly sized server infrastructure had been put in place for production.
Non-Functional Requirements were defined as part of the program but needed to be validated. The organisation had made an investment into HP LoadRunner years ago as the tool of choice for Performance Testing. Due to budget constraints, no further investments in testing tools were planned.
Taking all constraints and requirements into consideration, the program defined the following key objectives for Performance Testing:
- Measure the performance of the applications from the different office locations.
- Compare the performance results with the non-functional requirements.
- Provide recommendations for potential improvements.
The solution was to approach Performance Testing from a Single User perspective. Key business processes would be tested from all major locations in the world in a realistic and automated fashion over a number of months covering major application release cycles. Modern visualisation techniques would be used to significantly shorten the time to analyse and interpret the test results as well as presenting them to the program management. This Single User Performance Testing approach has a number of advantages over the "traditional" Multi User Performance Testing approach:
- It does not require a Production like environment but can use any existing test environment.
- The iterative approach allows the performance test scripts to mature with the application and the frequent use provides a good return on investment. Instead of spending a lot of budget on many performance test scripters over a short period of time with a significant risk of failure to deliver test results in time, the money can be spend on one or two performance test scripters for a longer period of time.
- Not all performance test scripts need to be ready and run at the same time.
- No workload profile is required.
- The automated scripts can be re-used for Multi User Performance Testing as soon as the production like test environment is available.
- Compared to the manual testing, the Single User Performance Testing can use much higher and more realistic data volumes within the business processes as data creation is automated.
The Single User Performance Testing solution was implemented using the following tools and approach:
- Resource constraints dictated that client execution performance (for example rendering time) was out of scope. This would be done using Profilers on client machines (for example Google Developer Tools) or functional test automation tools in combination with network protocol analysers like WireShark or .The testing would initially just concentrate on server response time inclusive of the network time.
- In each office location, two users with desktops would be identified and the HP Load Generator software installed on the desktops. Two measurements from each location will provide the assurance that the measurements collected are correct. If the measurements for the same activity would differ significantly between both desktops at all times then we understand to have an issue with the local user setup. This would not require any additional hardware investment and the response time measurements would be taken from actual user locations, hence very realistic. The scripts used during this testing could be reused during Multi User Performance Testing and would provide a consistent and repeatable way to measure performance. No additional expenditure for test tool licences was required.
- The performance test scripts would be run every 15 minutes from each office location and every one minute from a HP Load Generator in the data centre. This ensures that we are not falling into the trap of implicitely performing a Multi User Performance test. The key objective is to keep the concurrency on the test system to a minimum. The measurements taken from the data centre act as the "normal null" or in other words, the point of reference. The difference of the response time between the office location and the "normal null" can then be assumed as the location dependent part of the overall end user response time, this includes the combined impact of network delay, routing, chattiness and size on the end user performance experience. The "normal null" is the value that represents the server time. The conclusions that can be drawn fromn this measurement to assess if there is an issue or not depend on the setup of the test environment. The overall recommendation is that test environments should be only horizontally scaled down versions of production environments but CPU speed, Memory per concurrent user and Disk IO speed should be comparable to production. Then one can easily assume the "normal null" as the true representation of the server performance.
- Network delay (ICMP) and routing (tracert) were monitored and as part of the standard HPLoadRunner results the client throughput. The combination of these three measurements allows assessing the quality of the network, the routing and the bandwidth limitations.
- By using the lower level HP LoadRunner Web/HTML protocol the chattyness of the application is automatically recorded and can be impact assessed. Using a higher level HP LoadRunner protocol, for example the Ajax True Client protocol, would require additional effort to investigate the degree of Chattyness with additional tools (for example Wireshark or the Google developer toolset).
This overall SUPT approach could have been implemented in a similar fashion using other Performance Testing or Performance Monitoring Tools like HP Business Availability Centre (HP BAC).
All performance test scripts were executed from all office locations for a period of time that was long enough to provide valuable and valid data. The strategy to use two Load Generators per location also proved beneficial for building confidence in the test results and also helped to identify and resolve differences in user setups.
The "normal null" provided the ability to test against different environments like System Integration, User Acceptance or Pre-Production, and by comparing response times from different locations relative to the response times measured in the data centre distinguish between application, environment and location specific performance issues.
The pre-requisite for this to be possible requires running iterations of the performance test scripts in the data centre every one minute. All other locations will be iterating in 15 minute intervals (staggered start with 1 minute gaps between locations). The aim is not to load the server but to observe the difference in response time between running the same processes at the server and the remote location.
Once the results were available, the challenge was to analyse this data quickly and efficiently to provide information and analysis on:
- The worst performing locations.
- The worst performing test scripts (business processes).
- The performance against the Non-Functional requirements (SLAs).
- Routing and network delay information that could be verified by the network management team.
- Suggestions on how to improve the performance.
The result of the SUPT were large data tables of response times per transactions per locations. The 90th percentile was chosen as the representative value for the response time.
The challenge was to find a tool that could help to analyse this data quickly. Searching through myrads of visulisation tools and techniques i came across Mike Bostoks "D3" projects, from there to circular visualisations and from there to Circos, which I found the best tool for the job.
"Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions. This is ideal for tables. There are other reasons why a circular layout is advantageous, not the least being the fact that it is attractive."
There are a number of tools available which make working with Circos easier. One of them is the Table Viewer tool, which is also available online.
It is recommended downloading and setting up Circos on Linux. For the first analysis, it is easiest though to use the online version of the Table Tiewer tool here.
The performance test results need to be exported into to a data table that should look like the following image. The data table also needs to conform to the following requirements:
- No duplicate transaction names. It is recommended to use the “S01_xx” convention for transactions that are the same in all business processes (i.e. S01_Login)
- No spaces in the transaction or location names.
- Locations are the columns (“A1”, “A2”, etc).
- HP LoadRunner transaction names are the rows (“da”, “db”, etc).
- Rename the first Column to “Label”.
- Keep the names short.
- Save the file as a “Tab” delimited file.
Just using the default configuration for the on-line Table Viewer already produces amazing results. The configuration files used to generate the graph online can be downloaded and used with the local installation.
The result is an image like the one below. The key to understanding this graph is that starting at 12 o'clock and going clockwise one can see the response times of the HP LoadRunner response time measurements or also referred to as transactions (lower case letters) and going anti-clockwise for the locations (upper case letters). The thickness of the location is the sum of all response times measured from this location. There is also a histogram produced which is very useful for further analysis. One can immediately point out, just by visual comparison, the locations that are impacted by slow performance and also which specific processes and steps this applies too.
As the number of transactions can be quite large, this graph may become less informative for analysing transactions. The next step is to filter out the response times that are not an issue. So for example, transactions that have a response time less than the lowest SLA (5 seconds as an example) can be filtered out. There are two ways this can be achieved using the Table Viewer tool. One is to filter out these transactions but to still have the overall calculations (the histogram) include the values.
This graph really highlights the worst performing transactions and their percentage contribution to the overall business process performance and provides the basis for understanding the potential for performance improvements.
To view the same information in a more compact and zoomed view, the transactions that pass the SLAs can be excluded from the overall calculations as well.
It is now possible to clearly identify that “fa”, “ea”, “dz” and “da” are the worst performing transactions and the best performing locations are “A1”, “A2” and “A8”. The “A1” location is the data centre.
The aim is now to achieve a similar view of the Network delay and routing. As a first step the total network delay information for the rout from source to destination in the HP LoadRunner Analysis needs to be grouped by LoadGenerator (this is a feature of the Analysis tools). This information then needs to be exported to a spreadsheet.
The objective is to verify that the network delay is within expected SLA's. The following graph compares the Network delay measurements to known SLA thresholds from source to destination. In order to understand if any of the included location breach the SLA the information on host location needs to be added. For example, if HOST11 is located in Europe, then the SLA is passed. Located anywhere else, it would fail.
Click on the image to enlarge it
The throughput per location is also compared to the throughput at the point of reference and should be compared to the assumed available bandwidth at the location as well as the actual bandwidth used at the data centre location. The data centre location measurements will determine the maximum values that are possible to achieve assuming near unlimited bandwidth (i.e. Gigabit Ethernet).
Click on the image to enlarge it
The next step is to try and get a more detailed break down information about routing and network delay to get more context for the Network Management team. The basis for this analysis is the Network Segment Delay Graph in the HP LoadRunner analysis tool.
Sankey diagrams are directed graphs, where the size of the arrow line represents the quantity of flow. Sankey diagrams are used for a variety of flow visualizations. These are not limited to energy or material, but indeed to any (real or virtual) matter that flows "from" a source "to" a destination. Sankey diagrams draw the viewers attention to the largest flows, while at the same time showing the proportions of the flows among each other and indicating a "from-to" flow direction. Sankey diagrams are named after Captain Matthew Henry Phineas Riall Sankey, an Irish engineer (1853-1925). (www.e-sankey.com, en.wikipedia.org/wiki/Sankey_diagram, www.sankey-diagrams.com)
The following graph was created using the Sankey example from http://www.d3noob.org/ and using the network segment delay data extracted from the HP LOadRunner Analysis tool.
The steps to produce this graph are:
- Download the D3noob example files to a directory.
- Extract the Network Segment Delay data from the HP LoadRunner Analysis tool as a spreadsheet and then copy it into the “Sankey.xls” in the “data” subdirectory.
- Ensure that there are no zero values for the network delay, setting 0 values to 0.01 for example.
- Ensure that you have the headings as shown below and save it as a “.csv” file.
- Refresh the local URL and you will see a graph similar to the above.
Sankey diagrams are great to understand individual network paths and delay. When trying to display the complete network the limits of the d3noob examples came to light. The requirement was the ability to visualise complex network graphs with a tool that was easy to use. There are many tools around, especially in the area of social network analysis. NodeXL was chosen as it was the most suitable for this purpose.
"NodeXL is a free, open-source template for Microsoft® Excel® 2007, 2010 and (possibly) 2013 that makes it easy to explore network graphs. With NodeXL, you can enter a network edge list in a worksheet, click a button and see your graph, all in the familiar environment of the Excel window." (http://nodexl.codeplex.com/)
The expectation was that by putting all the network delay segment information into one tool it would help us to identify any particular routing issues. NodeXL allows for a lot more visual effects which have not been employed here (i.e. pictures for nodes).
Well performing network segments can be filtered out easily using the “Dynamic Filter” function based on the label. The label in this case is the numerical representation of the network delay. What has not been explored is the possibility to create interactive sums of source to destination network delay calculations. Even though it would be an interesting feature, it is not required as we have the overall source to destination measurements from the previous analysis.
To investigate the correct routing directions the graph properties can be changed from undirected to directed.
In this example the “diamond” in the middle looks suspicious. The tool allows for easy zooming in.
This shows that some of the traffic is routed from hop “E” to “J” through 2 different paths, namely “HH” and “II”. The route via “II” is a higher latency than the “HH” route. This is information that can be passed on easily to the Network Management Team and communicated well within the project. The expectation is that the Network team would investigate why the "diamond" exists and provide an appropriate solution.
By going through the steps above, the understanding of the application and location performance should be not so much a mystery anymore and it is time to start concentrating on critical performance improvements. To understand which improvements are critical the performance test results need to be compared to the Non-Functional requirements. This can be achieved using Circos with filtering or using standard Excel visualization capabilities with graphs and tables.
The table below shows a summary analysis of the response times per location. The SLA column values are retrieved from a lookup table to ensure easy management of SLAs in case there are any changes.
Click on the image to enlarge it
The summary of this table can be shown in a spider web graph like the sample below.
The same method would need to be applied to Network Delay and Throughput SLA measurements and comparison.
The key findings we now have:
- We know the Steps in the Business Processes that have Performance issues.
- We know the Locations that have Performance issues.
- We know the network delay and bandwidth measurements.
The next step is now to drill down into more detail for the business processes with Performance issues. The previous steps enabled us to come to this point very efficiently.
What to look for now is to answer the following questions:
- Does the application download many static objects? If yes, can these objects be cached locally to avoid a download over the network or can the number be reduced or can objects be combined?
- Is the application chatty and is the chattiness increasing with the amount of data processed. In the example situation, the application would make an update call to the server for every item in an order form. This is not a problem for 10 items during manual testing but using 500-1000 items using real data volumes during SUPT this had a significant impact on the performance.
- Does the application download or upload large objects like documents, images etc? Scanner settings can significantly increase the size of a scanned document without significantly improving its required quality. Checking file sizes for scanned documents significantly impacts upload and download times. In addition, increases in bandwidth will help here significantly.
In summary, the objective of this Single User Performance Testing approach is to provide an early sight of application performance from different internal locations around the world using real live data volumes. Using appropriate modern visualisations will help to analyse performance related data and identify performance issues very quickly. Using the above methods and tools also assured that the results can be passed around the program and understood by technical and non-technical people alike.
About the Author
Bernhard Klemm has been working in the IT industry for over 16 years in varying positions and industries as Developer, Architect, Performance Tuning Expert, Project/Program Manager, Engagement Manager and Test (Program) Manager. He can be found on LinkedIn or contacted via e-mail.
Ralph Winzinger Nov 25, 2014
John Krewson, Steve Ropa and Matt Badgley Nov 24, 2014