BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Using the Plan-Do-Check-Act Framework to Produce Performant and Highly Available Systems

Using the Plan-Do-Check-Act Framework to Produce Performant and Highly Available Systems

Bookmarks

Key Takeaways

  • With our focus  on adopting newer and diverse technologies, making performance and availability a priority still holds the key to success.
  • Although new features are important, customers are expecting the software applications to provide accurate information (reliability) and responses as and when needed (performance)  
  • Non functional requirements (NFR) are the key for planning infrastructure, architecture, design, performance testing, networking, and application monitoring.
  • There are certain best practices and guidelines that can be adopted for achieving the goal of performant and highly available systems. These include infrastructure design and setup, application architecture and design, coding, performance testing, and application monitoring.
  • As part of continuous improvement it is critical to focus on analysis of the performance testing results, application monitoring data, and incidents in production.

Today organizations are aggressively setting pace on the digitalization journey - embracing advanced technologies and modernizing IT. On the journey one element which remains unchanged and highly critical is ensuring the business applications remain performant and available in line to the business goals. 

Irrespective of the underlying technology (traditional platforms, low code/no code, serverless, analytics, robotics etc.) or landscape (public cloud, private cloud, or on-premise) application performance management (APM) and high availability (HA) still and will continue to be a key aspect towards achieving higher customer satisfaction for any strategic business application.

It is not just the new features or the application usability aspects which are valued by the customer. Customers are also looking for the business-critical applications to provide accurate information (reliability) and responses when needed (performance). 

The objective of the article is to leverage the PDCA (plan-do-check-act) framework for underlining the activities related to software application performance, availability, monitoring, and other best practices that enable IT teams to focus and ensure performant and highly available applications.

There are various tools available in the market which help with the implementation of the framework activities discussed as part of the framework. As the tools choice is highly dependent on individual use cases, technology stack, and enterprise architecture guidelines, those are not discussed in the article.

The below diagram depicts the key high-level activities under each of the phases.  

Plan

Infrastructure

The hardware sizing activity analyzes the requirements (number of transactions, total number of users, concurrent and peak user load, requirement around background processing, attachment storage, data retention, inbound and outbound connector/services calls, availability, and failover etc.) to arrive at a high level understanding of the infrastructure set up. Depending on the application requirements relating to geographic reach and multi availability zone (AZ), an edge setup may need to be considered. It is also recommended to consider if setting separate nodes to meet background processing requirements is necessary to  achieve overall performance needs. 

Although the hardware sizing exercise may not be initially accurate, it is a good start to proceed with the initial set up and start the development phase. Once the code is migrated to the pre-production environment, the assumptions and the requirements can be re-validated during the performance testing.

Architecture and Design

In this stage the technical approach is identified  that aligns to the requirements. For example, imagine there is a requirement to attach large files as part of the user request. To meet such a requirement, an optimum architecture and design approach is critical to avoid performance issues. Analyze the available technical options such as storing the files in a database versus leveraging a cloud repository like S3. or processing the files via browser-based uploads such that the  webserver memory is leveraged optimally avoiding adverse impacts on performance. Given the requirements there should be consideration on the optimal database connection pool size. In case of requirements for storing the user or requestor passivation details consider file server system versus databases.

Non-Functional Requirements

The foundation for validating the performance and availability are the NFR (non-functional requirements). It identifies and documents performance, availability goals, and other related requirements. It is critical to understand the non-functional requirements before designing meaningful performance testing scenarios. The business users may want a specific screen or report to be displayed within a certain amount of time. Or in the case of a financial application managing collaterals, the requirement could be to have near real time loan against collateral comparisons.

Performance testing strategy

Once there is clarity on the non-functional requirements, the performance testing strategy is designed. The requirements need to be critically analyzed from a validation perspective – can all the in-scope scenarios be tested?. Next the high level and detailed level scenarios are identified to ensure that the performance testing scripts mimic the user actions. The test usage patterns should match the user load on your application at specific points of time in the day so that scenarios can be designed according to the workload model. Additionally identify the approach for test data and set up. It is ideal to have a production like environment for performance testing. However in case of environmental mismatch, there will be a need to come up with an extrapolation model to correlate the test performance against the actual production environment. 

If there are no non-functional requirements specified, the first run results are captured and discussed with business stakeholders to determine the baseline for comparison in consecutive runs. The comparison of performance against the baseline for each test run helps to determine the impact of new changes. It is crucial to identify the frequency for doing performance testing, such as every sprint, depending on the kind of changes planned for production

Monitoring strategy

As part of the monitoring strategy, identify the tools and approaches for availability monitoring and enabling notifications for performance alerts and exceptions.

Continuous Integration/Continuous Delivery

One of the crucial steps during the planning phase is considering the automated pipeline and integration of the performance and monitoring activities into the continuous integration and continuous delivery pipelines. For example, including checks for performance can help resolve issues early in the development lifecycle. The performance testing can be embedded in the pipeline to gain insights on the performance parameters and reject code that does not meet those parameters

Do

As part of the ‘Do’ phase, performance testing is performed to validate the application for go-live. Performance testing is recommended in scenarios such as

  • software or hardware upgrades
  • environment migrations (e.g. on-premise to cloud)
  • database migrations (e.g. Oracle to PostgreSQL)
  • major refactoring effort
  • major functional changes or enhancements
  • the application is extending into additional geographies
  • new user types are onboarded

Prior to starting performance testing, ensure the functional testing cycle (system and regression) is complete and all related issues are resolved and the application is stable. During development, follow the design and coding best practices such as caching, query optimization, reducing DB I/O, flushing of memory/clipboard post processing.

It is always helpful to have a process set up for code reviews, as it validates if the coding best practices are being followed. In the case of applications being built over an extended time or multiple teams consider allocating time for code refactoring. This is one area which is neglected in the rush to get new functionality out. However, it has been observed that refactoring (aka overcoming technical debt) can help to achieve positive results.

The main objective of performance testing is to understand the limits of your application in terms of response time and throughput and to validate the efficacy of the underlying infrastructure in terms of CPU, memory, and storage. Depending on the scenario, different tests such as load, stress, endurance, or spike tests can be leveraged. If the application demands job processing (e.g. email triggers, data synchronization) ensure those are running while the performance test is run.

Check

Once the performance testing is complete, compare the results of the newly conducted performance tests with the baseline to identify areas of improvements. Look for any issues that are impacting performance such as slow SQL queries, database indexes, large blob reads, or slow running integrations. Analyze the application alerts or exceptions observed during performance testing. In terms of monitoring availability, understand the functionality that is prone to instability. Review production incidents to understand the performance and availability issues faced by the end users.

Act

The act phase is all about refining your application, underlying infrastructure, or network to achieve the performance and availability objectives. On a continual basis, analyze performance testing results, production incidents, monitoring data, and alerts for insights to help achieve the set objectives and goals. Below are some of the areas to investigate for potential optimizations within the application, infrastructure, and network.

Application

  • Memory optimizations
  • Caching
  • Introducing database indexes to improve performance of searches
  • SQL query optimization
  • Leveraging pagination for large results sets

Infrastructure

  • Load balancing to limit the number of users on any one node to improve performance
  • Multi availability zone/region set up for global applications
  • Content delivery networks
  • Read replicas to divert read only traffic and reporting requirements

Network

  • Routing network traffic optimally
  • Edge locations/Cloud front

Conclusion

In today’s rapidly changing technology landscape, irrespective of the technology, one thing that remains constant is the need for reliable, performant and highly available applications. This article introduces the PDCA framework for discussing key activities towards achieving the set performance and availability goals for the application. The ‘Plan’ phase focuses on the identification of hardware requirements and infrastructure set up, defining the architecture and design considerations, and defining the performance and monitoring strategy. In the ‘Do’ phase, the focus is on implementing coding best practices, code reviews, performing static and dynamic analysis, and performance execution in line with the performance strategy identified in the 'Plan' phase. As part of the ‘Check’ step, the focus is on analyzing the execution results, monitoring data, and incidents to identify areas for continual improvements. The corrective action in the ‘Act’ phase is planned based on the gaps and improvement areas observed in the ‘Check’ phase.

About the Author

Girish Kulkarni is a senior DGM-Software Engineering at John Deere technology center in India. Girish has more than 18 years of experience in Information Technology. His areas of expertise include Pega BPM (business process management), software design, solution architecture, cloud technologies, software quality management, technology evaluations & implementations. Previously, he was part of Pega practice at Infosys. His focus areas include devops design and implementations and maintains keen interest in performance engineering and DDD (domain driven design). Girish holds a master's degree in systems management from Mumbai University.

Rate this Article

Adoption
Style

BT