Learning from failure in software

Although not everyone in my team would agree, I dare to say that building software is easy. The difficult part starts AFTER your first release when you need to start maintaining the application you've build while at the same time introducing new features, keeping the technical debt under control, evolve the application architecture under ever changing business needs and all of this at a minimal cost.

Building software is easy

During the lifetime of our application failures will happen. Although this is difficult to explain to our business stakeholders, failure in software is inevitable; people make mistakes, requirements are misunderstood, business needs change, … Rather than engaging in a man hunt to avoid failure, we embrace it and focus on learning from our mistakes.

Embrace failure

An important tool in our toolbox here is incident analysis; we need to figure out what happened, what caused a failure and most important how we can improve. In the complex (distributed) systems we build today, failure is seldom a simple sequence of cause and effect. The good news is that small failures are caused by the same systemic issues as large failures. So start treating every failure as an opportunity to learn and improve.

In that regard I would like to introduce you to https://www.learningfromincidents.io/.

The idea of this website is to create a community that reshapes how the software industry thinks about incidents, software reliability, and the critical role people play in keeping their systems running. If this community is successful, people will be doing and thinking about incident analysis in completely different ways than they were doing it before — as a valuable lens into not only where incidents come from, but what normally prevents them, what people do (and don’t) learn from them, and what makes incidents matter long after the dust has settled.

Learn more about this in their introduction post.

Remark: If you want to learn more about learning from your mistakes, the Google SRE book is a must read.

The art of simplicity

Search This Blog

Learning from failure in software

Labels

Popular posts from this blog

Kubernetes–Limit your environmental impact

Azure DevOps/ GitHub emoji

Podman– Command execution failed with exit code 125