Release It!: Design and Deploy Production-Ready Software by Michael Nygard discusses what it takes to make production-ready software, and explains how this differs from feature-complete software. On his website, Nygard described the motivation behind writing this book:
This book comes from my extensive experience living with systems in production. I've often been the one to get woken up at three in the morning when some supposedly 24x7 system goes down.
Other books on design and architecture only tell you how to meet functional requirements. They help your software pass QA. In "Release It!", I'll show you how to make your software production ready. If you don't want to wear an electronic leash, you need this book.
Read Book Excerpt and Review: Release It!
From the article:
InfoQ: There seems to be a tension between the up-front work involved in creating production-ready software and the Agile idea that you do something only when you need to, and refactor as necessary - what are your thoughts on this?
Michael Nygard: As an agile developer, I struggle with this tension myself. I don't have a perfect answer for resolving it, but I think there's a parallel here to good object-oriented design.
Once you've written some code and some unit tests that pass---regardless of which came first---you refactor the code to improve the design. "Improve". Well, what does it mean to improve the design? Doesn't that mean you have to have some notion of "better" and "worse" as it applies to OO design? It does, and that's where Martin Fowler's "code smells" from Refactoring come in. "Code smells" are a qualitative way to talk about better and worse design without getting all hung up on metrics.
I think there's something similar for the architecture. For me, a remote call without a timeout is an "architecture smell". So is a SOAP call or a REST GET that tries to fetch all orders for a customer, without applying a limit.
So, while I do not subscribe to big design up front, or "big architecture up front", I do believe in defining the boundaries within the system, designing failure modes into it, and eliminating "architecture smells" as we encounter them.