BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Failure Content on InfoQ

  • Testing Resiliency at PagerDuty Without a Simian Army

    Doug Barth, from PagerDuty, talked at DevOps Days London about their approach to start resiliency testing their systems without dedicating a lot of automation effort upfront. The goal was to quickly start learning about failure points and openly discuss how to fix them with only one hour per week of effort.

  • Learning from Failures with The Lean Startup

    The lean startup is about fast delivery of desired products to customers, and increasing your understanding about the needs of customers. With the lean startup, people can learn faster from failures and become better innovators. There are teachers that use a lean startup based approach in education, which helps their students to learn faster.

  • Avoiding Downtime When Cloud Services Fail

    Another AWS outage hit several large websites and their services last week. What can be done to avoid downtime? Architect for failover not just for scale.

  • Adopting Agile in an Environment of Fear

    Agile adoption and transformation is sometimes effective, and sometimes not. Is there a common thread to the failures? Does fear have anything to do with it? And what can we expect if we start an agile adoption initiative in an environment that is full of fear?

  • All Right It Failed, What Next?

    Usually failures result in anger, frustration and playing the blame game. However, failures are wasted if there is no learning from them. How can Agile teams make failures beautiful?

  • Commercial Interests Censoring Failures

    Philippe Kruchten described the Agile movement as "The agile movement is in some ways a bit like a teenager: very self-conscious, checking constantly its appearance in a mirror, accepting few criticisms..." and shared a list of twenty elephants in the room - uncomfortable issues that are ignored on purpose. The first of these unmentionables is that commercial interests are censoring failures.

  • Amazon EC2 Outage Explained and Lessons Learned

    Amazon has published a detailed report on the service failure plaguing one availability zone in the US East Region. The online media is full with analysis, commentaries and lessons to be learned from the event.

  • Lessons Learned from Skype’s Outage

    On December 22nd, 1600 GMT, the Skype services started to become unavailable, in the beginning for a small part of the users, then for more and more, until the network was down for about 24 hours. A week later, Lars Rabbe, CIO at Skype, explained what happened in a post-mortem analysis of the outage.

  • Code is the Culprit! Always?

    Multiple reasons can be quoted for the failure of software projects. Some projects fail because of bad requirements, others due to cost and schedule overrun and few simply due to bad management. If we do a root cause analysis, would all of the failed projects lead to bad code as the main culprit? Always?

  • Google Apps Has a Marketplace and Instant Failover

    The Google Apps Marketplace allows providers to create applications that integrate with Google Apps. The idea is to allow companies to integrate their own applications with Google’s applications serving some 2 million organizations totaling over 25 million individuals. Google also promises zero data loss and instant failover for Google Apps customers.

  • Scrum/Agile Failings or the Theses of Uncle Bob Martin

    In response to a question about the Inherit Shortcomings of Scrum/Agile - Uncle Bob Martin penned (in the spirit of Martin Luther), 7 theses: No Technical Practices, 30 Day Sprints are too long, Scrum Master sometimes turns into Project Manager, Scrum carries an anti-management undercurrent, and others.

  • Presentation: 10 Ways to Screw Up with Scrum and XP

    In this presentation filmed during Agile 2008, Henrik Kniberg talks about 10 possible reasons to fail while doing Scrum and XP. Maybe the team does not have a definition of what Done means to them, or they don't know what their velocity is, or they don't hold retrospectives.

  • Presentation: "We Suck Less!" Is Not Enough

    In this presentation filmed during Agile 2008, David Douglas and Robin Dymond discuss about companies which try to adopt Agile, but don't go all the way, resulting in failure and rejection of it, and predictably having a negative impact on Agile's future.

BT