IBM Discusses Record Setting SPECj Results and the Benchmarking Process
InfoQ was able to talk to Andrew Spyker and John Stecher about the results. Andrew Spyker is a Senior Technical Staff Member (STSM) leading the IBM ® WebSphere Application Server performance teams. Andrew’s specialty has been web services and SOA. John Stecher is the lead of our JEE performance teams in the WebSphere Application Server. John Stecher is a lead developer and architect for the SPECjAppServer benchmark for both the current and future versions. John has worked on SPECjAppServer for the last five years.
Could you tell us a little about your group at IBM?
The WebSphere Application Server performance team is based out of Rochester, Minnesota and Research Triangle Park, North Carolina. Our team is committed to delivering to our customers industry leading performance with the WebSphere Application Server code base. We focus on “real world” customer scenarios which have inspired us to create a variety of internally developed benchmarks, as well as collaborate on standardized benchmarks like the SPECjAppServer benchmark. We are committed to understanding what our customers want in performance and delivering. The team is divided into segments looking at future technologies, current product release cycles, as well as service stream environments to ensure that from a performance perspective the WebSphere Application Server continues to meet and exceed our customers’ demands. The performance team develops performance tuning guides and best practices in order to educate our customers on how to achieve the best performance with their applications on the WebSphere Application Server. Lately, Andrew has also been directly addressing performance with our customers through the WebSphere Community Blog.
How long did it take from start to finish to perform the benchmark from an organizational standpoint (first run to numbers you were ready to submit)? How much improvement where you able to achieve versus out of the box results?
The benchmark takes about 2-3 weeks to run on these smaller hardware configurations from unpacking the new hardware to submitting the final result to the SPEC organization. Larger results that involve much more hardware and complex network and database configurations can take anywhere from a month to three months to complete. Out of the box results are typically improved upon significantly as initially almost every software product on the market today is built to run on older hardware as well as brand new hardware and appropriate defaults for that broad spectrum don't necessarily result in blazing out of the box performance on only one specific hardware configuration. However, once you tune a small set of parameters, mainly the JVM heap size, to work with the total physical memory you are well on your way. Typically once you get the heap size set correctly you can expect to get 20-25% better performance by tuning other parameters on this specific benchmark.
What areas where you able to optimize the most?
In terms of tuning - the IBM ® JVM (JVM) and the thread pools were the areas that we optimized most in the benchmark. The JVM is probably the most critical. Getting tuning right on the JVM in terms of heap sizes and garbage collection policy can lead to significant improvements in performance. From there, tuning the thread pools inside the WebSphere Application Server is critical to this benchmark but in most cases is not as critical to customer applications getting the best performance. In benchmarking, when you are trying to squeeze every last transaction out of a system you actually end up tuning for the specific hardware itself.
The SPECjAppServer benchmark has led to aggressive optimizations across the entire JEE set of containers. Over the years of participating in SPECjAppServer, significant optimizations were driven into the web container, the JSP engine, RMI and the ORB, the EJB container, persistence, the transaction manager, workload management, and the networking code. Also, as we mention later, optimizations were made in the JDK and the WebSphere Application Server to help achieve peak performance on server class hardware.
What tools did you use for tuning and are they readily available?
For tuning this benchmark the most important tools that we use are readily available and come with WebSphere Application Server and the included JVM. To get the heap sizes and JVM parameters correct we simply use verbose garbage collection and then look for our garbage collection pause times and intervals. We try to tune the heap so the pause times are minimal and the times between garbage collections are at a maximum. From there, inside of WebSphere Application Server itself, we use Performance Monitoring Infrastructure (PMI) and Tivoli ® Performance Viewer (TPV) to monitor thread pool efficiency, connection pool usage as well as other vital statistics on the WebSphere runtime.
What was the biggest functional sacrifice you had to make for these performance improvements?
First, it is important to stress that we do not make sacrifices in our WebSphere Application Server codebase to participate in standardized benchmarking.
The biggest sacrifice that all vendors make when configuring SPECjAppServer is the lack of realistic high availability hardware topology configurations. To achieve the best performance with the minimal hardware for these benchmarks, all the vendors submitting results are running at nearly 100% CPU utilization on each piece of hardware. This is because benchmarks do not typically include failure scenarios. In the real world, if you wanted to have high availability, you would overprovision the hardware to handle failures. It would be interesting if benchmarks of the future had requirements for failures under load and with established service levels. But for now, this is a gap between standardized benchmarks and customer representative topologies.
When we are doing regular performance testing, we configure the scenarios to have “customer realistic” high availability topologies to ensure these code paths also are optimal in customer production environments.
Some people feel that SPEC results do not translate well to real life applications, what are your thoughts on that?
SPEC results translate to real life as the competition on the benchmark has made application servers from all vendors today incredibly fast. Our result alone is doing nearly 1200 complex business level transactions per second on a single 4-core hardware system. The SPEC benchmark competition has caused vendors to optimize common code paths through different parts of the WebSphere Application Server and software stack that customers use every day. These benchmarks drive value to customers under the covers far more than any performance result on a webpage can represent.
Where these benchmarks do not translate well to real life is that you cannot just take a SPECjAppServer2004 number and expect to get the same performance out of your application using the same tuning. In most cases your application uses different programming methodologies, database access patterns or other things that require tuning specific to it to extract maximum performance. There are also many other functional areas that the WebSphere Application Server supports that are not tested by the SPEC benchmarks. This is exactly why internally we have developed and use a number of different benchmarks in order to improve all the code paths instead of just the SPEC specific ones. This is also why we are hard at work on the next version of SPECjAppServer and other new standardized benchmarks which continue our commitment to lead in performance in all customer focused scenarios.
Overall one needs to use the SPECjAppServer numbers as a guide. These numbers show the overall potential of application server performance in a vendor neutral forum and open view. Publically, SPEC and other standardized benchmarking organizations are the most trustworthy sources of performance information. However, in order to get the same performance yourself you have to put in some work and effort to tune the runtime to your applications’ behavior and patterns. This is a best practice we always recommend. Do performance testing on your own applications early and often and tune accordingly.
Would you expect these benchmarks to hold up on similar non IBM hardware/software?
We would expect the benchmark to hold up with minor variations across similar non IBM hardware and software stacks. We work very hard directly with performance engineers at other supported hardware and OS companies to ensure that we perform well on their hardware and OS stacks. We also work with performance engineers at other supported database companies to ensure that our database access mechanisms perform well.
What would you say influence the benchmark the most (JVM, CPU, Database, etc).
This is really is a difficult question. All benchmark results are affected by both hardware performance (CPU, memory, network, etc.) and software stack performance (JVM, database, application server, benchmark code, etc.) just like performance results are in the real world.
Most of the recent benchmark results highlight the latest hardware. Increases in hardware performance lead directly to the WebSphere Application Server software stack becoming faster. However, to exploit this newer hardware, we have focused on ensuring our software stack (both the JVM and the WebSphere Application Server) is ready for the hardware and exploits the hardware capabilities to the maximum. Two good examples of hardware changes and the software exploitation are 64-bit and massively multi-core CPU’s. Our focus on 64-bit and multi-core started well before this hardware became common place in the general consumers’ hands.
The WebSphere Application Server code has equally strong influence on benchmark performance. As discussed earlier, the WebSphere Application Server code has undergone aggressive optimization to achieve the performance shown by this benchmark. Given the fact that the entire stack contributes to the benchmark performance, we must thank all the performance teams that have helped IBM ® lead the industry in SPECjAppServer performance across our entire hardware and software stack – our Systems and Servers teams, our JVM/JIT teams and our DB2 ® teams.
An evaluation of the WebSphere Application Server V6.1 can be downloaded here.
SPEC is a non-profit organization that establishes, maintains and endorses standardized benchmarks to measure the performance of the newest generation of high-performance computers. Its membership comprises leading computer hardware and software vendors, universities, and research organizations worldwide. For complete details on benchmark results and the Standard Performance Evaluation Corporation, please see www.spec.org/. Competitive claims reflect results published on www.spec.org as of October 03, 2007 when comparing SPECjAppServer2004 JOPS@Standard per CPU Core on all published results.