Agile, Architecture and the 5am Production Problem

Author Michael Nygard counts himself among those who still believe there is such a thing as architecture. In his InfoQ article Agile, Architecture and the 5am Production Problem, Nygard walked the reader through the whodunnit mystery of a real production problem. The surprising conclusion illustrated his message that building applications for the real world, and not just QA, requires a failure-oriented mindset and strong defensive programming tactics. The article poses a challenge to the Agile community's ideas about what constitutes "just enough" architecture.

Nygard's new book "Release It! Design and Deploy Production-Ready Software" from the Pragmatic Programmers has been Amazon's top "Hot New Release" in the Software Design category for the last month. This article expands on a story the author told in the book, explicitly relating it to the Agile approach he has practiced since back when they were called "lightweight methodologies":

Agile methods tell us a lot about how to build functional software that changes easily over time. Programmers created techniques such as unit testing and refactoring for use by other programmers, and they improved the craft as a result. For the most part, though, agile methods focus on the interior of the system boundary. In the agile community, debate continues about how much attention we should pay to the architecture of things outside the application boundary. The most extreme adherent (or should that be "eXtreme" adherents?) say, "Let the architecture emerge from relentless refactoring and vigorous unit testing!"
I am an agile developer and architect, but you should count me ... among those who think architecture must stay grounded in implementation. A good architecture is one that survives contact with the real world. A bad one creaks and groans its way through the day, chewing up people and computers. I have often observed that architects who retreat into abstractions create architecture that cannot be built successfully.

The article told the true story of an interested and unexpected failure that would only occur in the wee hours of the morning after a quiet period on the website: an application that would hang at 5 AM every day, involving a database that was only ever queried. The guilty parties---simultaneously the victims---were a web server, a database server, and a firewall. For those whose first reaction is to think "there's no way to create a deadlock if you're just querying": you'll be interested to see what Nygard uncovered.

Nygard used the story to illustrate what he calls the "failure-oriented mindset", which doesn't mean that he expects his projects to fail, but rather that he builds his systems under the assumption that somehow, someday, every single piece of the architecture will fail. In his book, he recommended building a deviously malicious test harness to simulate many kinds of failures, from simple wire breaks to responding with the wrong protocol.

This article voiced a challenge to the Agile community, where those who espouse the "just-enough-architecture" mantra still don't agree on what this actually means in practice. Certainly, Feature Driven Development and XP, as proposed in their respective canonical articles and books, are far apart on how to deal with this. Agile, Architecture and the 5am Production Problem points to an area in which, Nygard believes, agile methods are conspicuously silent.

Agile has expanded to embrace the Testing discipline, and has more recently been reaching to work out better interactions with disciplines like Technical Writing and Usability. Is the discipline of Architecture another candidate for harmonization with Agile practices, or does Agile already embody adequate principles and practices to build such robust architectures?

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Agile topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter