BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Failure Content on InfoQ

  • Can We Trust the Cloud Not to Fail?

    I will start with the theory behind failure detection, and then review a couple of real-world examples of how the mechanism works in a real cloud - on Azure. Even though this article includes real-world applications of failure detection within Azure, the same notions could also apply to GCP, AWS, or any other distributed system.

  • Q&A on the Book Fail to Learn

    The book Fail to Learn by Scott Provence explores how we can learn from failure and how trainers and course designers can use gamification to foster failure and learning in their educational environments. When playing games it's ok to try out something, lose the game, learn from it, and restart and try something else.

  • Failover Conf Q&A on Building Reliable Systems: People, Process, and Practice

    One of the biggest engineering challenges associated with maintaining or increasing the reliability of a system is knowing where to invest time and energy. InfoQ recently sat down with several engineers and technical leaders who are involved with the upcoming Failover Conf virtual event, and asked their opinion on the best practices for building and running reliable systems.

  • An Engineer’s Guide to a Good Night’s Sleep

    Increased microservices adoption, fueled by the move to the cloud where architectures and infrastructure can flex and be ephemeral, adds complexity every day to the systems we create and maintain. This takes place alongside operating models with autonomous and totally empowered teams, so each distributed system has its own tapestry of technical approaches, languages, and services.

  • The New Killer Apps: Teamwork and Weak Signal Detection Lessons from the Military

    There are a lot of great teamwork and weak signal detection lessons from the military that can help forward-leaning leaders create the organizational agility and safety they need to survive and thrive on their own terms in this VUCA world. This article explores how teamwork and weak signal detection lessons from the military are becoming “The New Killer Apps.”

  • Resilient Systems in Banking

    Resilience is about tolerating failure, not eliminating it. To build a resilient system, you must build a system that absorbs shocks, and continues or recovers. Following best practices for resilient architecture, including established cloud patterns, allowed Starling Bank to build a bank, from scratch, in a year, against a backdrop of highly public outages amongst incumbent banks.

  • Soft Skill Patterns for Software Developers: The “Learning from Unintended Failures” Pattern

    Soft Skill Patterns describe human behaviours that effectively solve recurring problems. The "Learning from Unintended Failures" pattern helps us improve the resilience of a system after a failure. The pattern follows 4 steps: identify a failure, quickly resolve any immediate impact, analyse root cause and system behaviour during the failure, and finally generate and implement improvement ideas.

  • Q&A with Ash Maurya on Scaling Lean

    In the book Scaling Lean, Ash Maurya explores how entrepreneurs can collaborate with stakeholders to establish a business model for a new product or service using Lean Startup principles. It builds on top of his first book, Running Lean, showing how to use experiments, measure business progress, and scale your startup.

  • Adaptable or Predictable? Strive for Both – Be Predictably Adaptable!

    Our efforts to improve software development face the question of what to focus on. Should we govern for predictability without concern of value, maximizing cost-efficiency without concern for end-to-end responsiveness? Or maybe do the opposite and govern for value over predictability, focus on responsiveness over cost efficiency? What we really need is to be predictably adaptable.

  • Q&A with Diomidis Spinellis on Effective Debugging

    The book Effective Debugging by Diomidis Spinellis describes 66 different approaches for effective debugging of applications and systems. It provides methods, strategies, techniques, and tools for finding and removing faults, and gives examples for using them in different settings.

  • DevOps at Seamless: The Why, How, and What

    The key thing about DevOps is understanding under which circumstances it should be introduced to your organization. Organizations that adopt DevOps go through a change that affects both processes and culture. This article focuses on why DevOps is needed, what concepts and values should support it, as well as how we implemented it at Seamless, what results we obtained and the challenges we faced.

  • Innovation at Telefónica with Lean Startup

    Creating digital products is different from building traditional telco products: the uncertainty is much higher, the way of creating value for the customer is totally different and lifecycle is much faster says Susana Jurado Apruzzese. Telefónica adapted Lean Startup to their processes, culture and organization to make it work.

BT