InfoQ Homepage application performance management Content on InfoQ
-
Solving Mysteries Faster with Observability
At QCon plus, a virtual conference for senior software engineers and architects covering the trends, best practices, and solutions leveraged by the world's most innovative software organizations, Elizabeth Carretto discussed observability at Netflix and how their internal tool, Edgar, comes into play.
-
Using the Plan-Do-Check-Act Framework to Produce Performant and Highly Available Systems
The PDCA (plan-do-check-act) framework can be used to outline the performance, availability, and monitoring to enable teams to ensure performant and highly available applications. These include infrastructure design and setup, application architecture and design, coding, performance testing, and application monitoring.
-
Cloud Native and Kubernetes Observability: Expert Panel
InfoQ recently caught up with Observability experts to discuss several topics including fundamental questions about what Observability really entails, the misconceptions and challenges that the users are facing, the open standards that are influencing the industry in general and why there is more interest in this area off late.
-
Site Reliability Engineering Experiences at Instana
With the popularity of distributed architectures, distributed databases, containers and container orchestrators, an approach that emphasizes automation and a culture of collaboration is a natural fit for modern day operations. Site Reliability Engineering takes engineering practices that have been established and proven in software engineering and applies them to the field of operations.
-
Software Architecture and Design InfoQ Trends Report—April 2021
An overview of how the InfoQ editorial team sees the Software Architecture and Design topic evolving in 2021, with a focus on what architects are designing for today.
-
Piercing the Fog: Observability Tools from the Future
Visibility into those distributed systems and how they are performing is challenging. Despite all the observability tools available for site reliability, debugging remains incredibly difficult, and many SREs would agree that their debugging processes have only marginally improved. This article explores how observability for troubleshooting could be done from the user’s point of view.
-
Performance Tuning Techniques of Hive Big Data Table
In this article, author Sudhish Koloth discusses how to tackle performance problems when using Hive Big Data tables.
-
Training from the Back of the Room and Systems Thinking in Kanban Workshops: Q&A with Justyna Pindel
In the book Kanban Compass, Justyna Pindel shares her experiences from applying training from the back of the room and systems thinking in her Kanban workshops. She adapted her training approach by connecting with attendees and providing them suitable exercises to maximize learning opportunities.
-
Monitoring Microservices the Right Way
Modern systems are more complex to monitor as they tend to emit large amounts of high cardinality data. Recent innovations in open-source time series databases have improved the scalability of newer monitoring tools such as Prometheus. These solutions are able to handle the high scale of data while providing metric scraping, querying, and visualization based on Prometheus and Grafana.
-
Instrumenting the Network for Successful AIOps
AIOps platforms empower IT teams to quickly find the root issues that originate in the network and disrupt running applications. AI/ML algorithms need access to high quality network data to determine what went wrong and where. Network visibility starts from TAPs around network equipment, and teams can add application instrumentation and logs as data sources for complete insights.
-
Load Testing APIs and Websites with Gatling: It’s Never Too Late to Get Started
Conducting load tests against APIs and websites can both validate performance after a long stretch of development and get useful feedback from an app in order to increase its scaling capabilities and performance. Engineers should avoid creating “the cathedral” of load testing and end up with little time to improve performance overall. Write the simplest possible test and iterate from there.
-
Resilience in Deep Systems
Deep systems, with multiple layers of microservices, have special challenges, and handling them requires the right mindset and tools.