BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage On-call Content on InfoQ

Articles

RSS Feed
  • Building an Effective Incident Management Process

    A good incident management framework can help organizations manage the chaos of an outage more effectively leading to shorter incident durations and tighter feedback loops. This article introduces the components necessary for a healthy incident management process.

  • Software Systems Need Skin in the Game

    Consequential decisions need to be taken by the people who pay for the consequences, by the people with skin in the game, and modern software practices need to reinforce this idea. On-call engineering is the quintessential modern engineering practice to create skin in the software development game.

  • Sustainable Operations in Complex Systems with Production Excellence

    Successful long-term approaches to production ownership and DevOps require cultural change in the form of production excellence. Teams are more sustainable if they have well-defined measurements of reliability, the capability to debug new problems, a culture that fosters spreading knowledge, and a proactive approach to mitigating risk.

  • Observability-Driven Development for Tackling the Great Unknown

    How does observability-driven development differ from monitoring? As our distributed systems become increasingly more complicated and as our silos break down for DevOps testing, automation, and efficiency, ODD arises as a superset of monitoring to understand your code’s unknown unknowns. Includes insights from Honeycomb Founder Charity Majors.

  • Book Review: Site Reliability Engineering - How Google Runs Production Systems

    "Site Reliability Engineering - How Google Runs Production Systems" is an open window into Google's experience and expertise on running some of the largest IT systems in the world. The book describes the principles that underpin the Site Reliability Engineering discipline. It also details the key practices that allow Google to grow at breakneck speed without sacrificing performance or reliability.

BT